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Abstract 


Typestate systems ensure many desirable properties of imperative 
programs, including initialization of object fields and correct use of 
stateful library interfaces. Abstract sets with cardinality constraints 
naturally generalize typestate properties: relationships between the 
typestates of objects can be expressed as subset and disjointness 
relations on sets, and elements of sets can be represented as sets 
of cardinality one. In addition, sets with cardinality constraints 
provide a natural language for specifying operations and invariants 
of data structures. 

Motivated by these program analysis applications, this paper 
presents new algorithms and new complexity results for constraints 
on sets and their cardinalities. We study several classes of con- 
straints and demonstrate a trade-off between their expressive power 
and their complexity. 

Our first result concerns a quantifier-free fragment of Boolean 
Algebra with Presburger Arithmetic. We give a nondeterministic 
polynomial-time algorithm for reducing the satisfiability of sets 
with symbolic cardinalities to constraints on constant cardinalities, 
and give a polynomial-space algorithm for the resulting problem. 
The best previously existing algorithm runs in exponential space 
and nondeterministic exponential time. 

In a quest for more efficient fragments, we identify several 
subclasses of sets with cardinality constraints whose satisfiabil- 
ity is NP-hard. Finally, we identify a class of constraints that has 
polynomial-time satisfiability and entailment problems and can 
serve as a foundation for efficient program analysis. We give a sys- 
tem of rewriting rules for enforcing certain consistency properties 
of these constraints and show how to extract complete information 
from constraints in normal form. This result implies the soundness 
and completeness of our algorithms. 


1. Introduction 


Program analyses that reason about deep semantic properties are of 
great value for software development; the value of such analyses 
is growing with the adoption of language constructs that eliminate 
low-level program errors. Many deep semantic properties are natu- 
rally expressible in fragments of set theory, so constraint solving for 
such fragments is of interest. This paper presents new algorithms 
and improved complexity bounds for fragments of set theory. The 
starting point of our constraints is the boolean algebra of finite (but 
unbounded) sets. 

Sets in program analysis. The boolean algebra of finite sets 
is a fragment of set theory that allows the basic set operations 
of intersection, union, and complement on sets of uninterpreted 
elements. Although simple, it turns out that this fragment can 
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express many properties of interest in program analysis. Examples 
include typestate properties and public interfaces of data structures. 

Set specifications generalize typestate properties [29, 26]: the 
fact that an object o is in the typestate ¢ is represented as the set 
membership of o in t. Through inclusion and disjointness con- 
straints, sets can also express relationships (such as hierarchy or 
orthogonality) between different typestates. Objects can be rep- 
resented as sets of cardinality one using a cardinality constraint 
|o| = 1, so set membership reduces to subset. Multiple set member- 
ships can then encode constraints such as |t| > & for any constant 
k. 

Sets can also provide natural abstractions of container data 

structures. When a content of a data structure is represented as an 
abstract set s, an operation such as insertion can be characterized 
by a postcondition s’ = s U e where e is the set corresponding 
to the element being inserted. By expressing both typestates and 
data structure abstractions, sets can be used to combine the results 
of different analyses operating on the same program. Such an 
approach allows us to combine the scalability of typestate analysis 
with the precision of shape analysis and theorem proving [30, 28, 
27, 46]. 
Sets with cardinality constraints. The use of the cardinality op- 
erator on sets leads to a connection between set algebra opera- 
tions and integer linear arithmetic, as evidenced, for example, in 
the condition |a U b| = |a| + |b| for disjoint sets a and b. It is 
therefore natural to consider constraints that combine integer linear 
arithmetic with set algebra operations. These constraints constitute 
the Quantifier-Free Boolean Algebra with Presburger Arithmetic, 
or QFBAPA for short — they are the quantifier-free fragment of 
BAPA constraints whose decision procedure and complexity we 
have studied in [23, 22]. QFBAPA constraints can be used to ver- 
ify an invariant such as |a| = |b| which allows us to conclude that 
if a is nonempty, so is b, and therefore it is possible to call an op- 
eration that removes an element from b. Similarly, if ¢ is an integer 
variable and s is a set, it is possible to verify an invariant |s| = 7 
stating that an integer 7 correctly maintains the size of the set s. 
In our experience, specialized decision procedures such as [22] are 
the only automated technique for deciding with non-trivial cardi- 
nality constraints. Currently, however, the complexity of these de- 
cision procedures limits their applicability. In this paper we give 
new algorithms for solving set cardinality constraints; these algo- 
rithms provide exponential improvements over existing approaches 
and make the checking of cardinality constraints in larger formulas 
more feasible. 

Our paper provides a systematic study of constraints on sets in 
the presence of cardinalities. We study both more expressive and 
less expressive fragments and demonstrate a trade-off between the 
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Figure 1. Quantifier-Free Formulas of Boolean Algebra with Pres- 
burger Arithmetic (QFBAPA) 


expressive power and the efficiency of the algorithms. The main 
contributions of our paper are the following: 


e PSPACE algorithm for QFBAPA. The best previously 
known algorithms for QFBAPA [23, 22, 45] execute in non- 
deterministic exponential time, and involve searching for an ex- 
ponentially large object. In this paper we first give a form of 
bounded model property that shows that it is possible to replace 
reasoning about symbolic cardinalities such as |a| = 7A |a| = 
where i is an integer variable, with guessing sufficiently large 
constant cardinalities, such as |a| = 1000 A |b| = 1000. More- 
over, we give a space-efficient algorithm for solving the result- 
ing constraints on sets with large constant cardinalities. This 
gives a PSPACE decision procedure for QFBAPA and is the 
first contribution of this paper. 


A Polynomial-Time Class. Given that QFBAPA constraints 
are NP-hard, the question remains whether there are interest- 
ing fragments of sets with cardinalities which can be reasoned 
about in polynomial time. In a quest for such fragments, we 
identify several features of constraints, each of which leads to 
NP-hardness. By eliminating these features we have discovered 
a class (called i-trees) that has a polynomial-time satisfiability 
and entailment (subsumption) problems, while still supporting 
subset, union, disjointness, and arbitrarily large cardinality con- 
straints. This class can therefore express generalized typestate 
constraints such as multiple orthogonal classifications into inde- 
pendent or disjoint sets. The identification of this polynomial- 
time class, and the development of algorithms for testing the 
satisfiability and subsumption of constraints in this class is the 
second contribution of this paper. While the resulting algo- 
rithms are efficient, the proof of their completeness is some- 
what lengthy, and involves characterizations of normal forms 
of i-trees and the construction of models for i-trees in normal 
form. We therefore only summarize the main ideas; we refer 
the reader to the full version of the paper [32] for details. Addi- 
tional proofs are also included in the Appendix. 


We proceed by defining the fragment QFBAPA in Section 2. We 
present a PSPACE algorithm for QFBAPA in Section 3, defining 
the simpler CBAC constraints and identifying their NP-complete 
fragment, CBASC constraints. 


2. Constraints on Sets with Cardinalities 


Boolean Algebra with Presburger Arithmetic. Figure | presents 
the syntax of the constraints studied in this paper, we call these for- 
mulas Quantifier-Free Boolean Algebra with Presburger Arithmetic 
(QFBAPA). QFBAPA constraints contain two kinds of values: in- 
tegers and sets, each with corresponding applicable operations. The 
sets are interpreted as subsets of some arbitrarily large finite set. s 
denotes a set variable, 7 denotes an integer variable. The symbol 
|B| denotes the cardinality of the set B and establishes the connec- 
tion between set and integer terms. MAXC is a special free variable 
denoting the size of the universal set. If b is a set, b° denotes its 
complement. K dvd T denotes that Kk divides T. K denotes con- 
stants, encoded in binary: a constant k is encoded using O(log k) 
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bits. The symbol A in Figure 1 denotes atomic formulas; a literal is 
an atomic formula or its negation. 

A quantified version of this language (BAPA) is studied in 
[23, 22]; where we give an algorithm that establishes a doubly ex- 
ponential space upper bound on the complexity. Because quanti- 
fied BAPA subsumes Presburger arithmetic, the doubly exponential 
nondeterministic time lower bound [15] applies to BAPA as well. 


Preliminaries. If S is a finite set, |S| denotes the number of 
elements in S. A literal is an atomic formula or its negation. Z = 
{...,—1,0,1,...} is the set of integers, N = {0,1,...} is the 
set of natural numbers. [a..b] denotes the set of integers {a,a + 
1,...,b}. If f : A — Bisa function and S C A, we define 
f15] = {f(@) |ae S}. 

If A is a set, the notation A” has several potential meanings; 
the specific meaning should be clear from the context. A” for 
n € {1,2,...,} is the set of vectors (a1,...,@n) where a; € A 
for 1 < 7 < n, and A” is the set of matrices [apg] with m 
rows and n columns with elements ap, for 1 < p < m and 
1 < q <n. The expression A° denotes the complement of the 
set A. If a € {0,1}, then A® denotes A for a = 1 and A® for 
a=0. 

The relation = denotes the equality of the values of metavari- 
ables denoting syntactic objects, so if f; and f2 are formulas, then 
fi = fe means that they are the same formula. In the context of 
inclusion diagrams (Section 4), = will denote the semantic equiva- 
lence of diagrams (we use = to denote the equality of diagrams). 


3. A PSPACE Algorithm for QFBAPA 


Verification conditions arising in program verification can often 
be expressed using quantifier-free formulas, so it is natural to ex- 
amine whether more efficient algorithms exist for QFBAPA con- 
straints. When applied to QFBAPA formulas, existing algorithms 
run in non-deterministic exponential time (NEXPTIME): the al- 
gorithm [45] requires nondeterministically guessing an exponen- 
tially large object, whereas the algorithm a from [22] produces an 
exponentially large quantifier-free Presburger arithmetic formula. 
The question arises whether there exist algorithms that avoid non- 
deterministically guessing exponentially large objects. We show 
that this is indeed the case. Namely, we first show that Presburger 
arithmetic formulas generated by the algorithm a from [22] can in 
fact be solved in deterministic exponential time. Our result reduces 
QFBAPA to a simpler system of CBAC constraints (shown in Fig- 
ure 3), then applies a theorem by Papadimitriou [36] in a novel 
way. This leads to a deterministic EXPTIME decision procedure 
for QFBAPA satisfiability, which is an improvement on previously 
existing algorithms. Nevertheless, the question arises whether it is 
possible to avoid the construction of a non-deterministically large 
system of equations. It turns out that this is indeed possible: we 
present an alternating polynomial-time (and therefore, PSPACE) 
algorithm for QFBAPA. Therefore, it is possible to solve QFBAPA 
using solvers for quantified boolean formulas [9, 48, 37]. 

Figures 2 and 4 present our PSPACE algorithm for QFBAPA. 
The algorithm has two phases. 

In the first phase, the non-deterministic polynomial-time algo- 
rithm in Figure 2 reduces QFBAPA constraints to a simpler class 
of constraints. We call these simpler constraints Conjunctions of 
Boolean Algebra expressions with Cardinalities (CBAC). CBAC 
constraints have a very simple syntactic structure (see Figure 3), 
but capture the key difficulty in solving QFBAPA: the need to con- 
sider exponentially large cardinalities on exponentially many set 
partitions. 

In the second phase, the algorithm in Figure 4 checks the satis- 
fiability of CBAC in alternating polynomial time and therefore in 
polynomial space. The key insight behind our algorithm is that it 
is possible to use a divide and conquer approach to avoid explicitly 
representing all possible regions in the Venn diagram. 
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Let f be the input QFBAPA formula. 


1. Replace each Z-variable with a difference of two N- 


variables: 
. . +f “tT of ett 
Cli1,---,4n] — Cli — i, ---, tn — th] 
i4,77,...,%),, 7% are fresh N-variables 


2. Ensure that all set algebra expressions appear within 
cardinality constraints by normalizing with the following 
rules: 


Clb = bz] + C[bi C bz A be C bi] 
C[bi © b2] > C[/b1 9 b5| = O} 


3. Eliminate divisibility constraints: 
Clk dvd t] — C[ki = t], 7 is fresh N-variable 


4. Move all cardinality constraints to top level: 
C[lbil,.--, [On|] > fir A fe 
de 
where fi ! Of, si Or | 


def my 
fe — |1]=MAXC \ mi |b; |=2; 
j=l 


and i1,...,%m, are fresh N-variables 
5.Let p be a propositional formula such _ that 
p(a1,.-+;Q@mo) = fi for atomic formulas a1,...,@mo- 
Nondeterministically select the truth value a; € {0,1} 
for each atomic formula a;, so that p(ai1,...,@m,) is 
true. Let f11 = "N avi 
: TS 


6.For each conjunct -(ti=t2) in fii, non- 
deterministically replace the conjunct with one of 
the conjuncts (t1 + 1 < tz) or (tg +1 < th). 


7. Transform linear integer constraints to normal form: 
C[7(ti < t2)] > Cite +1< ti] 
Cli < te] Cl -—te+i=0] 
Cli =t2] CD. ot; =k] 

8. Let no be the number of integer variables in the entire 

formula. The resulting system is of the form: 

Av = dN ANGAy |bj| = tp, 
where A € Z™", d € Z™, and v = (i1,.--,%no) 
where each i; is a variable ranging over Z and 1 < 


P1,-+-;Pm, < my are variables denoting cardinalities 
of sets. Let S be the total number of set variables in 
b1,...,bm,. Letm = mo + m1, n = max(no, 2°), 
and let M = n(ma)?""*1, 

9. Non-deterministically select a vector k = (k1,..., kno) 
where kj € {0,1,...,M} for 1 < 7 < no, such that 
Ak =d. 


my 
10. Call CBAC decision procedure on /\ |b;| = kp;. If 
j=l 


there exists a solution, then report the formula satisfiable. 


Figure 2. An NP Algorithm for Reducing QFBAPA Constraints 
to CBAC constraints of Figure 3 
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Figure 3. Conjunctions of Boolean Algebra expressions with Car- 
dinalities (CBAC) 


Given a CBAC constraint 


my 
So bal = hs 
j=l 


where the free set variables of b1,...,bm, are among s1,...,Ss, 
run CBAC-check(|], d) with d = (ki,...,km,). 
proc CBAC-check([v1,..., Un], d) returns result 


where v1,..., Un, result € {0,1}; de N™ 
if (n < S) then 
existentially choose do, di € N™? such that do + di = d; 
universally do 
1= CBAC-check([v1, see Un, 0}, do) and 
rg = CBAC-check([v1,..., Un, 1], di); 
retum 71 A 72; 
else 
let p; =eval(b;, [s1 > v1,...,8s 1 Us]) 
for all (1 < 7 < m1); 
Jo = {dy | py = 0}; 
Ji = {dy | py = 1}; 
return Jo C {O} A [Ai] <1. 


proc eval(b, a) returns result 
where 6: Boolean Algebra formula 
a: {s1,...,8s}— {0,1} 
result € {0, 1} 
treating b as a propositional formula, 
return the value of 6 under assignment a. 


Figure 4. An Alternating Polynomial-Time (and PSPACE) Algo- 
rithm for Checking the Satisfiability of CBAC Constraints 


We next discuss our algorithm in more detail and argue that it is 
correct. We begin with the description of the steps of the algorithm 
in Figure 2, which reduces symbolic cardinalities to large constant 
cardinalities. 


1. Non-negative integers. To simplify the later steps, the first step 
makes all integer variables range over non-negative integers N, by 
replacing each integer variable 2 with a difference 71 — 72 of fresh 
non-negative integer variables 21, 2. 


2,3. Eliminating set equality and subset, and integer divisibility. 
The next step converts set equality and set subset into cardinality 
constraints. This step helps the later separation between the boolean 
algebra part and the integer linear arithmetic part. We then elimi- 
nate any divisibility relations using multiplication and a fresh vari- 
able. 


4. Flattening. The next step separates the formula into the 
boolean algebra part, denoted f; and the integer linear arithmetic 
part, denoted f2. This step simply amounts to naming the cardinal- 
ity of each set by a fresh integer variable. 


5,6. From quantifier-free formulas to conjunctions. An obvious 
source of NP-completeness of QFBAPA is the presence of arbitrary 
propositional combinations of atomic formulas. An effective way 
of dealing with propositional combinations is to enumerate the 
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satisfying assignments of the propositional formula using a SAT 
solver, and then solve the conjunctions of literals [16, 17]. Steps 5 
and 6 of the non-deterministic algorithm in Figure 2 are an abstract 
description of such procedure. The goal of step 6 is to eliminate 
disequalities, which involve non-deterministic choice between the 
two inequalities. 


7. Normal form for integer constraints. The algorithm elimi- 
nates the remaining negations of atomic formulas and transforms 
linear constraints into normal form Av = d. 


8,9,10. Estimating sizes of integer variables. The resulting sys- 
tem contains linear integer equations of the form yet cjij =k, 
and set cardinality constraints of the form |b| = 7. The algorithm 
computes an upper bound M on integer variables in any poten- 
tial solution of the system, using several parameters: the number 
of conjuncts n, the number of integer variables no and the number 
of set variables S. The computation of the upper bound is based 
on an observation that the satisfiability of the conjunction of con- 
straints |b] = 2 can be reduced to the satisfiability of equations 
of the form >7"_, 1; = 4, where variables J; denote sizes of set 
partitions (regions in Venn diagram) whose union is the set b; this 
is a specialization of the idea in [22] to the case of quantifier-free 
formulas. 

Let s1,...,ss be all set variables appearing in formula and 
consider a constraint |b] = 7. Consider all partitions ()/_, s. 
for a; € {0,1}. For each such partition bp, introduce a fresh N- 
variable /,,, which denotes the cardinality of cube b,. Then consider 
a constraint of the form |b| = 7. Each set is a union of regions in the 
Venn diagram (by the disjunctive normal form theorem) so suppose 
that b = bp, U...U bp,. Then replace the term |b| = 7 with the 

q=1 'pq = 1. We use the term “CBAC linear equations” to denote 
a system of linear equations resulting from the constraints |b] = 7 
as described above. 

As a result, we obtain a system of mo + m linear equations 
over non-negative integers, where mo equations have a polynomial 
number of variables, and m equations (CBAC linear equations) 
have exponentially many variables. It is easy to see that there exists 
a surjective mapping of solutions of the original constraints on 
sets onto solutions of the resulting linear equations (the mapping 
computes the cardinality of each Venn diagram). Therefore, the 
original system is satisfiable if and only if the resulting equations 
are satisfiable. Moreover, we have the following fact. 


FACT | (Papadimitriou [36]). Let A be an m x n integer matrix 
and b an m-vector, both with entries from [—a..a]. Then the system 
Az = b has a solution in N™ if and only if it has a solution in 


[0..M]” where M = n(ma)?"*+ 


Fact | implies that the estimate 17 computed in step 8 of the algo- 
rithm in Figure 2 is a correct upper bound. Using this estimate, step 
9 of the algorithm non-deterministically guesses the values of all 
integer variables such that the original linear equations Ax = d are 
satisfied. All this computation can be performed in nondeterminis- 
tic polynomial time, and (unlike [22]), does not involve construct- 
ing explicitly a system with exponentially many equations. Having 
picked the values of integer variables, including the variables i on 
the right hand side of constraints |b] = 7, we obtain a conjunction 
of constraints of the form |b] = k where k is a constant whose 
binary representation has polynomially many bits—these are pre- 
cisely the CBAC constraints in Figure 3. We have therefore shown 
the following. 


LEMMA 1. The algorithm in Figure 2 reduces in non-deterministic 
polynomial time the satisfiability of a QFBAPA formula to the 
satisfiability of CBAC formulas. 


It remains to find an algorithm for CBAC constraints. 
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A PSPACE algorithm for CBAC. One correct way to solve 
CBAC constraints is to solve the associated CBAC linear equa- 
tions. This system has exponentially many variables, each of which 
can take any value from [0../]. Therefore, guessing the values of 
each of these variables can be done in non-deterministic exponen- 
tial time; similar approaches not based on equations also require 
guessing exponentially large objects [45]. Note, however, that there 
are only polynomially many CBAC linear equations. Using the idea 
of the proof [36, Corollary 1], we can therefore show that a dynamic 
programming algorithm can be used to solve the system in polyno- 
mial time. In fact, we can use the dynamic programming algorithm 
from the proof of [36, Corollary 1]. Instead of fixing the size of the 
equations ™ to be constant, we simply observe that ™m, is poly- 
nomial in the size of the input, whereas the number of variables 
is singly exponential. The bound M therefore yields a singly ex- 
ponential deterministic time dynamic programming algorithm for 
CBAC. While this is better than existing results, we show that an 
even better result is achievable. 

Clearly, any algorithm that explicitly constructs CBAC equa- 
tions will require at least exponential time and space. Our solution 
is therefore to adapt the dynamic programming algorithm to a di- 
vide and conquer approach that always represents the equations in 
terms of their original, polynomially sized, boolean algebra expres- 
sion. Such an algorithm runs in alternating polynomial time, con- 
suming polynomial space, and is presented in Figure 4. To see the 
idea of our PSPACE algorithm, consider the CBAC linear system 
of equations written in the vector form: ey ajl; = d where d, 
a, are vectors and 1, are the variables for 1 < 7 < 2?. The algo- 
rithm guesses the vectors do, d; € N” such that do + di = d, and 
recursively solves two equations: 


2p-1_y 2P 


oy ajl; =do A Se ajl; = dy 


j=l j=2p-1 


This algorithm creates an OR-AND tree whose search gives the 
answer to the original problem. A position in the tree is given by the 
propositional assignment [v1,..., Un] to boolean variables. Each 
leaf in the tree is given by a complete assignment [v1,...,vs] 
to set variables. Note that we never need to explicitly maintain 
the system during the divide phase of the algorithm, it suffices to 
determine in the leaf case p = 0 whether the coefficient a; is 0 
or 1. The algorithm does this by simply evaluating each Boolean 
algebra expression b for the assignment [v1,..., vs]. 


THEOREM 1. The algorithm in Figure 4 checks the satisfiability of 
CBAC constraints in PSPACE. The algorithm given by Figures 2 
and 4 checks the satisfiability of QFBAPA constraints in PSPACE. 


Theorem 1 improves the existing algorithms for QFBAPA from 
both a complexity theoretic and an implementation viewpoint. A 
deterministic realization of previous NEXPTIME algorithms runs 
in doubly exponential worst-case time and requires exponential 
space; a deterministic realization of our new algorithm runs in 
singly exponential time and consumes polynomial space. Previous 
algorithms would require running a constraint solver such as a SAT 
solver [47] on an exponentially large constraint; the new algorithm 
can be solved by running a quantified boolean algebra solver [48] 
on a polynomially large constraint. 


NP fragments of CBAC. We have seen that both CBAC and 
QFBAPA constraints are in PSPACE. Both of these classes of con- 
straints are NP-hard, because the constraint |b] = 1 is satisfiable iff 
b is corresponds to a satisfiable propositional formula. Moreover, 
Lemma 1 shows that QFBAPA constraints are in NP iff CBAC con- 
straints are in NP. For some subclasses of CBAC constraints we can 
indeed show membership in NP. Define conjunctions of boolean al- 
gebra expressions with small cardinalities, denoted CBASC, to be 
the same as CBAC but with constant integers encoded in unary 
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notation, where an integer x is represented in space O() as op- 
posed to O(log x); such encoding can therefore be exponentially 
less compact. 


LEMMA 2. The satisfiability of CBASC constraints is NP- 
complete. 


CBASC solutions are NP-hard because |b] = 1 is a CBASC con- 
straint. One way to prove membership in NP is to observe that 
CBASC is subsumed by the language of set-valued fields which 
was proven to be in NP [24, 25] by reduction to the universal class 
of first-order logic formulas, which has the small model property 
[7, Page 258]. Another way is to consider the notion of sparse solu- 
tions of CBAC linear equations. An /-sparse solution is a solution 
to CBAC linear constraints with at most / non-zero elements. An 
M-sparse solution to CBAC linear constraints with 2° variables 
can be encoded as an M-tuple of pairs ([v1,..., vs], &) where the 
propositional assignment [v1,..., vs]; encodes one of the 2° in- 
teger variables, and k specifies the value of that integer variable. 
This encoding is polynomial in /Sw where w is the number of 
bits for representing the largest component of the solution. For any 
CBAC linear constraint /\’"_, |bj| = kj, each solution is M-sparse 
where M = max(k1,..., km). For CBASC solutions, M is poly- 
nomial in the size of the CBASC representation because each k;; is 
encoded in unary, so sparse solutions can be guessed in polynomial 
time. This proves that CBASC constraints are in NP.’ 


4. Inclusion Diagrams 


This section introduces inclusion diagrams (i-diagrams), a graph 
representation of CBAC constraints. Figure 5 shows a formula with 
sets and cardinalities and an equivalent i-diagram. I-diagrams allow 
us to naturally describe fragments of CBAC constraints and the al- 
gorithms for checking satisfiability and subsumption of these frag- 
ments. The basic idea of i-diagrams is to represent the subset partial 
order using a graph where sets are annotated with cardinalities, and 
then indicate the disjointness and union relations by constraints on 
direct subsets of a set. To efficiently represent equal sets, the nodes 
in the i-diagram stand not for set names, but for collections of set 
names that are guaranteed to be equal. Finally, we associate un- 
interpreted predicates with collections of nodes, representing the 
fact that elements of given sets satisfy the properties given by the 
predicate. The uninterpreted predicates illustrate a way to combine 
i-diagram representations with other constraints. 


DEFINITION | (i-diagrams). We fix a finite set SN of Set-Names, 
and a finite set PN of predicate names. We denote by PN~ the set 
of atoms {+ P, —P|P € PN}. 

An i-diagram (Inclusion-Diagram) is either the null-diagram |g or 
a tuple (S, 0a, Sons, Split, Comp, Clnf, CSup, ®) such that: 


¢ S C P(SN) is a partition of SN containing (nonempty) equiva- 
lence classes of set names that are guaranteed to be equal, with 
Oa € S the equivalence class corresponding to names of sets 
whose interpretation is the empty set 0; 


e Sons : S > P(S) represents subset relation; 
we define S ~> S’ #49 «€ Sons(S’); then (S,~+) is a 
graph, so we call elements of S nodes, and the elements of ~ 
edges; we write ~ for the transitive closure of ~; 

e Split, Comp : S — P(P(S)) represent disjointness and com- 
pleteness of set inclusions; if S is a node, then Split(S)) is a set 
of split views, where each view is a nonempty set of sons that 


' Sparse solutions are interesting for general CBAC constraints as well. As 
of yet we have no example of a CBAC constraint whose associated CBAC 
equation system is satisfiable but has no sparse solutions; moreover, we can 
generalize the notion of sparse solutions to solutions representable using 
binary decision diagrams [8] while preserving polynomial-time verifiability. 
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D is such that Clnf({si}) = 1, CSup({si}) = 5, Sons({s1}) = 
{{s5, 56}, {s4}, {s3}}, Comp({si}) = {{{8s, 86}, {sa}, {83} bt 
Split({si}) = {{{ss,so}}, {{sa}, {sa} 3}, O({si}) = {+P} 


and is equivalent to 


so =O A S85 = S6 A 

$2 Us3Us84U S85 C 81 A 81 C 85 A 83 C 82 A 85 € 84 
83184 =0 A 81 C 83U84U85 A 84 C 85 

1 < |si] < 5A |sa] =1A |s5| < 3A |s3| < 2A 

Va € 81. P(x) A Va € 83. nQ(x) 


Figure 5. An example i-diagram D and an equivalent formula 


represent pairwise disjoint sets, and Comp(S) is a set of com- 
plete views each of which is a set of nodes that represent sets 
whose union is equal to the father; we require 


U Split(S) = Sons(S) 
U Comp(S') € Sons(S) 


forall S €S; 


e Clnf, CSup : S — N specify lower and upper bounds on the 
cardinality of sets; 


e ®@ : S — P(PN*) maps nodes to the uninterpreted unary 
predicates and their negations that are true for all sets of a 
node. 


To avoid confusion between set names, nodes (sets of set names), 
and views (sets of nodes), we use lowercase letters s,s;,s’ to 
denote set names, uppercase letters S,S;,5’ to denote nodes, 
and letters Q,C to denote views and sets of nodes in gen- 
eral. When D #14a is a diagram, unless otherwise stated, we 
name its components S, (a, Sons, Split, Comp, Clnf, CSup, ®, 
and similarly we name the components of D’ as 
S’, 01, Sons’, Split’, Comp’, Clnf’, CSup’, &’. 

In a graphical representation of an i-diagram, we represent 
each element S € S where S = {s1,...,8n} using underly- 
ing sets {s1,..., Sn}. We represent inclusion S; ~» S2 by an 
arrow from Si to S2. We represent a split view Q € Split(S) 
where Q = {S1,..., Sn} with a circle connected with undirected 
edges to S1,...,.S, and an arrow leading to S. We represent a 
complete view similarly, using a filled square instead of a circle. 
For each node S € S we indicate its cardinality bounds by anno- 
tating the node with [a..b] where a = Clnf(S), b = CSup(S). 


We represent =(S) = {+P,,...,+P,} by annotating S with 
+P,,...,+Pn. We represent 0g = {81,...,8n} by annotating 
the node {s1,...,8n} with Oa. 


DEFINITION 2 (Semantics of i-diagrams). An interpretation of SN 
and PN is a triple (A, a, =) where 


e Aisa finite set (the universe); 
ea: SN > P(A) specifies the values of sets; 


: PN — P(A) specifies the values of unary predicates; 


(y] 


An interpretation I is a model for an i-diagram D, denoted I — D, 
iffVs € Da.a(s) = Q, and for all S € S where S = {81,..., 8n}, 
the following conditions hold: 
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© a(s1) =... = a(Sn); 
accordingly, define a(S) “f a(si)=...= 
° CInf(S) < /a(S)| < CSup(S/) 
) 


© VP. (+P) € ®(S) = a(S) C E(P) 
° VP. (—P) € 6(S) > a(S) C E(P)° 
e VS" € Sons(S). a(S’) C a(S) 


© VQ & Split(S). VS1, S2 € Q.S1 A Sz > a(S1)Na(S2) = 0 
+ VQ € Comp(5).2(8) C Us,eo W51) 
We use the standard notions of satisfiability, subsumption (entail- 
ment), and equivalence: 
Dis satisfiable <=> id. IED 
D' ED = : 
D'=D D' EDADED' 


DEFINITION 3 (Explicit Disjointness). 
We write disjp s, (51, S2) as a shorthand for 


Si 4#S2 A AQE Sons(So). S1,52€Q 


and we say that S,,S2 are explicitly disjoint, and we write 
disj7,(S1, S2) iff 


IS}, 54,50 € 8, $1 St A S2~ SS A disip,s,(S1, 93) 


LEMMA 3. I-diagrams have the same expressive power as CBAC 
constraints. 


By “same expressive power” we here mean that there is a natural 
pair of mappings between the models of i-diagrams and solutions 
to CBAC constraints. 

Because nodes in i-diagrams are collections of set names, we 
can define the following operations. 


DEFINITION 4 (Factor-i-diagram). Let p C S x S be an equiv- 
alence relation on nodes. We define D/p as follows. Define 
La/p =a. Let D = (S, 0a, Sons, Split, Comp, Clnf, CSup, ®). 
We define D/p = D’ = (S’, Sons’, Split’, Comp’, Clnf’, CSup’, 
©’) as follows. Define h so that if {S1,..., Sn} is the equivalence 
class of S under p, then h(S') = S1U...U Sn. If Q CS, define 
h[Q] = {h(S) | S © Q}. Then let S’ = h{S]. Consider S’ € S’. 
Both S and S' are partitions, and given S' € S' there is a unique 


set {S1,...,Sn} CS such that S' = S$; U...U Sy. Then define: 
Clnf’($’) = max(ClInf(S1),..., Clnf(S;,)) 
CSup’($") = min(CInf(Si),..., Clnf(S,)) 
Sons’(S’) = h[Sons($1) U...U Sons(S;,)] 
6'(S") = (51) U...U ©(Sy) 
Split’(S’) = {h[Q] | Q € Split($1) U... U Split(S,,)} 


Wis 
Comp’ (S’) = {h[Q] | Q € Comp(S1) U...U Comp(S;,,)} 


DEFINITION 5 (Merge). For any i-diagram D we define the i- 


diagram D[{Merge(Q)] — D/p for the equivalence relation p = 
{(51, $2) | Si, S2 € Q}U {(S,S) | S € S} 


In the sequel we impose the following restrictions on the form 
of i-diagrams. 
DEFINITION 6 (Simple Diagrams). A diagram is D is simple iff 
D = a or all of the following conditions hold for all S € S: 
a) (S,~») has no cycles, in particular S ¢ Sons(S) 
b) Op ¢ Sons(S) 
c) 0 € Split(S) A @ ¢ Comp(S) 
d) YQ,Q’. Q € Split(S) AQ’ ¢ Q > Q’ ¢ Split(S) 
e) VQ,Q". Q € Comp(S) AQ’ 2 Q => Q! ¢ Comp(S) 
f) CSup(Qa) = 0, Sons(0a)=®(02) =o) 
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proc Simplify(D) : 
1. use fixpoint iteration to compute p as 
the smallest equivalence relation such that: 
1.1. Si SoA S2~ Si > (S1,S2) € p 
1.2. ($,0a) € pA SiS => (Si, 0a) € p 
1.3. @ € Comp(S) = (5,04) € p 
1.4. disjp 5, ($1, S2) A ($1, 52) € p > (S2,0a) € p 
1.5. disjp 5, ($1, $2) A (So, 51) € p > (S2,0a) € p 
1.6. {Si} € Comp(S) => (S,51) € p 
2.D:=D/p 
Split(S) —{Q — {Oa}|Q € Split(S), S ¢ Q} 


3. | Comp(S){Q — {Oa}1Q € Comp(S), S ¢ Q} 
Sons(S) +Sons(S) — {Qa, S} 


Split(S') — Split(S) — {0} 
—{Q | 5Q’ € Split(S), Q’ 2 Q} 
4. | Comp(S) — Comp(S) — {0} 
—{Q | 4Q’ € Comp(S), Q’ € Q} 


CSup(0a) — 0 

®(Oa) «0 
5. | Sons(0a) —@ 

Comp(0a) — 0 

Split(da) — 0 
6. return D 


Ses 


Where [a <— 6] denotes the result of updating the component a of 
i-diagram D with value b. 


Figure 6. Polynomial-time algorithm Simplify to compute an 
equivalent simple i1-diagram 


Simplicity eliminates redundancy from diagrams, but does not re- 
strict their expressive power, as the following lemma shows. 


LEMMA 4. For every i-diagram D we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm Simplify 
in Figure 6. 


5. Sources of NP Hardness and Definition of 
I-Trees 


The satisfiability of i-diagrams is NP-hard because i-diagrams have 
the same expressive power as CBAC constraints. We have observed 
that the general directed acyclic graph structure of i-diagrams al- 
lows us to encode NP-complete problems; this motives the follow- 
ing two restrictions. 


DEFINITION 7. 

An i-diagram D is tree shaped iff 

(S, ~») is a tree (with an additional isolated node 4) 

An i-diagram D has independent views iff 

for all Q1,Q2 € Split(S) U Comp(S) at least one one of the 
following two conditions holds: 


°QiNQ2=90 
© Qi € Split(S) A Q2 € Comp(S) A Qi C Qe. 


Recall that, by Lemma 4, it suffices to consider i-diagrams with 
acyclic graphs of the subset relation. The tree shape condition is 
then a natural next restriction on the structure of i-diagrams. How- 
ever, due to the presence of Split and Comp, the tree shape condi- 
tion by itself does not reduce the expressive power of i-diagrams, 
and further restrictions are necessary. The independent views con- 
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dition extends the tree condition to the entire graphical representa- 
tion of i-diagrams, including the circles and squares that represent 
Split and Comp views. The conjunction of these two conditions 
can be expressed by saying that the graphical representation of i- 
diagram is a tree. 


REMARK 1. We can express the combination of the conditions: 
being simple, being tree shaped, and having independent views by 
saying that there are only four kinds of edges in the corresponding 
graphical representation:” 


e from anelement S € S—{@ } toa circle 

e from a circle to a square, indicating that all nodes of a split view 
belong to a complete view 

e from a circle to an element S € S—{Qa}, 

¢ from a square to an element S € S—{Qa}. 


Unfortunately, the restrictions on tree shape and independent views 
are not sufficient to guarantee a polynomial-time decision proce- 
dure in the presence of predicates associated with nodes. The rea- 
son is that the ability to encode disjointness of arbitrary sets leads to 
NP-hardness, yet even with tree structure and independent views it 
is possible to assert that two arbitrary sets S; and S» are disjoint by 
letting (+P) € ®(S;) and (—P) € ®(S2) for some uninterpreted 
predicate P. A simple way to avoid this problem is to require that 
® contains only positive atoms (+P). A more flexible restriction 
is the following. 


DEFINITION 8. An i-diagram D has independent signatures iff 
for every pair of distinct nodes S,S2 such that (—P) € ®(S1) 
and (+P) € ®(S2) for some P € PN, at least one of the following 
two conditions holds: 


1. Sy and S2 are explicitly disjoint, that is, disjz(S1, S2) 
2. S; and Sz have compatible signatures, that is, there exists a 
node S' such that 


Sra SA Sos SA 
Sig($1) M Sig(S2) C Sig(S) 


where Sig(S) = {P| (+P) € ®(S) v (—P) € ®(S)}. 


The independent signatures condition ensures that any disjointness 
conditions are either 1) a result of the fact that the ancestors of 
S; and S2 are explicitly stated as disjoint, or 2) a result of a 
contradictory predicate assignment (the case when 5S and S2 have 
compatible signatures, so there exists a parent that resolves which 
of (+P) or (—P) hold for both S; and S2). 

The discussion above leads to the definition of i-trees, for which 
we will give polynomial-time algorithms for satisfiability and sub- 
sumption in Sections 6 and 7. 


DEFINITION 9 (i-trees, iT). An i-tree T is a simple i-diagram 
such that T = 4 or such that all of the following three conditions 
hold: 


1. T is tree shaped 


2. T has independent views 
3. T has independent signatures. 


We denote by iT the set of i-trees. 


The following theorem justifies why all three conditions in our def- 
inition of i-trees are necessary. Its proof is based on a reduction 
from graph 3-colorability, which can be encoded using slightly dif- 
ferent i-diagrams for each of the three cases. The common property 
of these diagrams is that they can encode disjointness of arbitrary 
pairs of nodes. 


2 As a result, we can recognize this structure in linear time using, for 
example, a tree-automaton [12]. 
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THEOREM 2. Omitting any one out of three conditions from Defi- 
nition 9 yields a class of diagrams whose satisfiability is NP-hard. 


We note that in addition to NP-hardness, the omission of tree 
shaped or independent views properties in fact retains the full 
expressive power of CBAC constraints, using a similar argument 
as in Lemma 3. 

Our ability to specify i-trees as a natural subclass of i-diagrams 
justifies the definition of i-diagrams themselves. For example, the 
definition of i-trees would have been more complex had we chosen 
to represent disjointness using a binary relation 5; M s2 = 0. 

Let us also observe that, despite the imposed restrictions, i- 
trees are fairly expressive. In particular they can express hierar- 
chical decomposition of a set given by a node S into disjoint sets 
Si,..., Sn, by letting {S1,..., Sn} © Split(S) NM Comp(S). De- 
spite the independent view condition, we can have multiple orthog- 
onal decompositions, so {91,..., 5%, } € Split(S') A Comp(S) for 
{$3,..., Si, }N{S1,...,; Sn} = @. This allows i-trees to naturally 
express generalized typestate constraints. 


6. Deciding the Satisfiability of I-Trees 


In this section we prove that the satisfiability of i-trees is decidable 
in polynomial time. For this purpose we introduce a set of weak 
consistency conditions C; (Definition 10) such that: 


(6.1) We can enforce weak consistency for any satisfiable i-tree us- 
ing a rewriting system R (Definition 11) with the following 
properties (Lemma 5): 


¢ R” is semantic-preserving; 

e if a non-1g i-tree is in R” normal form, then it satisfies 
weak consistency conditions; 

e for a particular strategy (Figure 9) the system R termi- 
nates in polynomial time. 


(6.2) Every i-tree that satisfies weak consistency conditions is satisfi- 
able; Lemma 6 gives an algorithm for constructing a model for 
any i-tree that satisfies weak consistency conditions. 


Figure 9 summarizes the polynomial-time satisfiability decision 
procedure whose correctness (Theorem 3) follows from the results 
of this section. 


DEFINITION 10 (Weak Consistency). An i-tree satisfies weak 
consistency iff I Ala and T satisfies the following conditions 
forall S €S: 


VS" € Sons(S'). &(S’) D &(S) 
CSup(S) >0=>VPEPN. {+P,—P} Z ®(S) 
YQ € Comp(S). CSup(S) < &(CSup[Q]) 
YQ € Split(S). Clnf(S) > 4(Clnf[Q]) 
Cinf(S) < CSup(S) (Cs) 
6.1 A Rewriting System R“ for Enforcing Weak Consistency 


We introduce the following rewriting system to enforce weak con- 
sistency properties when possible. 


DEFINITION 11 (System R”). For each tuple (k, name, 
condition, effect) in Figure 7, we define a rewriting rule on 
i-diagrams by 

spot def 


DD! = (DF_la Acondition \ D’ = Dieffect]) 


name 
for each assignment spot of the free variables appearing in the 
condition column. We define Rx by 
D—D! £4 spot. D PSD’ 


Ry name 


We define R™ as union of oem tg 1<j <5. 
5 
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| k | name conditions | effect 


ae mag 
C1 an Z ae 
+P, ai C O(S 
Unsat bo) )n=0 CSup(S 
c2) CSup(S 


a3) Q € Comp 3 
UpSup | b3) n = oe [Q]) CSup(S 
c3) CSup(S) > 
aa) @ € Split(S 
4 | UpInf | 64) n = d(Clnf[Q]) Cinf(S) —n 
ca) Clnf(S) <n 


as) CInt(S) > CSupls 


Figure 7. System R“ for ensuring weak consistency 


mn A A 
3.2 
3-5] (3. Me [3..5 be ] 


; a fh fh 


--- > 


ae ar 
B 
[4.5] (4. Vy [4.5] [4.5] d 
Cc é Cc 
[202] (2..2] [2.2] [2.2] 
D D D D 
0..6 [0..0] [0..0] 
ay nee +P-P +P-P 
A,{C, 
Tg <2! a ET Mae we Vg 
Dn hi Unsat UpSup Error 


Figure 8. An example sequence of rewriting steps for R” 


Figure 8 shows an example sequence of rewriting steps applied to 
an i-tree. 
LEMMA 5 (Properties of R”). 
1. R” is iT-stable, that is 
TeiT A ore a =>T' ¢iT 
2. R” preserves the semantics, that is 


D—P=D=TD' 
Rw 


3. R” enforces weak consistency when possible, that is, a dia- 
gram D in R” normal form is either equal to La or it is weakly 
consistent 


4. R” terminates in polynomial for the strategy corresponding to 
the algorithm Ry in Figure 9. 


Proof sketch. 
1. Follows easily from the fact that R” rules do not modify Sons, 
Split, Comp. 
2. Follows by construction of R” rules. Suppose D eri D'. Then 


D K- D’ follows from conditions a; (1<i<5), and D’ K D 
follows from conditions c; (1<i<A4). 


3. For every k = 1..5, the condition of application of the rule Ry 
corresponds to the negation of C;,. When a diagram is in normal 
form for the rule Rg, it either satisfies Cz, or is lg. 

4. To prove that Ry corresponds to a polynomial strategy, we 
prove by induction that applying the rule Rx in the speci- 
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proc Rie(7) 
1. for every S € S from the root to the leaves 
for every Q € Comp(S) 
try to apply DnPhi(S, Q) to T 
2.for every S €S 
for every P € PN 
try to apply Unsat(S, P) to T 
3. for every S € S from the leaves to the root 
for every Q € Comp(S) 
try to apply UpSup(S, Q) to T 
4. for every S' € S from the leaves to the root 
for every Q € Split(S) 
try to apply UpInf(S, Q) to T 
5. forevery SES 
try to apply Error(S) to T 
return T 


proc ItreeSAT(T) 
if (Rxr(Z) =a) return satisfiable 
else return unsatisfiable 


Figure 9. Polynomial-time algorithms Ry and ItreeSAT to 
compute 7?“ normal form and check satisfiability of i-trees 


fied direction (from the root to the leaves or from the leaves 
to the root), enforces C, everywhere, and when C;, holds, the 
rule is not applicable anymore. Finally, we prove that each 
tule R; for k = 1..12 preserves the conjunction of proper- 
ties Aj=1..(&-1) Rx, and as a consequence, we never need to 
reapply any of the rules R; forj < k.m 


6.2 Constructing Models for Weakly Consistent I-Trees 


The following Lemma 6 is crucial for the completeness of our 
algorithm, and justifies the definition of weak consistency. 


LEMMA 6 (Model Construction). Jf an i-tree T is weakly consis- 
tent, then we can construct a model for T. 


The high-level idea of the proof of Lemma 6 is to first build the first 
two components (A, a) of the model, and then extend the model 
with = using the independent signatures condition for i-trees. We 
build the (A, @) part of the model by building a model for each 
subtree using an induction on the height of the i-tree. To construct 
models that satisfy Split and Comp constraints in the inductive 
step, we use a stronger induction hypothesis: we show that there 
exists a model (A, q) for a tree rooted in node S with |A| = 
for all ClInf(S) < k < CSup(S), and we rely on the properties 
of weak consistency to prove the inductive step. The proof of this 
lemma is interesting because similar ideas are used when building 
example models that show the completeness in Section 7. 

Putting all results in this section together using the argument at 
the beginning of the section, we obtain the following theorem. 


THEOREM 3 (ItreeSAT Correctness). T is satisfiable if and only 
if WeakNF(T) 4a. Therefore, the algorithm ItreeSAT in Fig- 
ure 9 is a sound and complete polynomial-time decision procedure 
for the satisfiability of i-trees. 


7. Deciding Subsumption of I-Trees 


The goal of this section is to prove that we can decide the subsump- 
tion of i-trees in polynomial time. Note that the subclass of i-trees 
is not closed under negation or implication, so we cannot decide 
T — T’ by checking the satisfiability of -(J = 7"). Instead, 
our approach is to bring TJ into a form where the properties of the 
models of J are easy to read from T. We then check that T en- 
tails each of the conditions that correspond to the semantics of T’. 
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name 


fa 
9 
8 
= 
> 
=. 
=. 
3 
& 
9 
ee 


ae) ({S} 8 Qo) € Comp(S' 
be) n = Clnf(S’) 
—X(CSup[Qo]) 


C6 
az) ({S} & Qo) € Split(S") 
b7) n = CSup(S’) 
—»1(CInf[Qo]) 
(S) 
Ss 


QE ‘\ 
CSup($) < X(Clinf[Q]) 
; ) Cn = Comp($)U{Q} 
, cs) Cr Z Comp(S) 
11] 
[12] 


Q 
fo) 
8 
ue} 

= 


Cinf(S) > 5(CSuplQ)) 
bg) Cn = Split(S)U{Q} 
co) Cn & Split(S') 
aio) Q € Comp(S' 
bio) dn = ®(S) Uf) [Q] 
c10) dn Z &(S) 


aii) {S' } € Comp(S) 
“Follow the application of these rules by Simplify. 


10] UpPhi 
Ti] Void" 


12 


Q 
aA 
2 
F 


Equal” 


Figure 10. Rules for System 7 


We formalize the intuitive condition of being easy to read in the 
notion of strong consistency. We build on the system R™ from the 
previous section to create a larger rewriting system 7e for ensuring 


strong consistency. We introduce a polynomial-time strategy for 7 
that transforms every i-tree into |g or into an 1-tree that is strongly 
consistent, and we give polynomial-time algorithms for extracting 


the information from strongly consistent i-trees. 


DEFINITION 12 (Strong Consistency). An i-tree T is strongly 
consistent iff it is weakly consistent and satisfies all of the following 
properties: 


YQ € Comp(S). ViS0 € Q. 
ClInf(So) > CInf(S) — S(CSup[Q — {So}]) (Ce) 


VQ € Split(S). VSo € Q. 
CSup(So) < CSup(S) — E(Cinf[Q—{So}]) (Cz) 


YQ € Split(S). Q ¢ Comp(S) > 
CSup(S') > X(CInf[Q]) (Cs) 


YQ € Comp(S). Q ¢ Split(S) > 
Cinf(S) < &(CSup[Q]) (Co 


VQ € Comp(S). (\(®[Q]) € &(S) (Cio 
S Ap = CSup(S) > 0 (C11 
Q € Comp(S) > |Q| >1 (Ciz 


ee! pea 


) 
) 
) 
) 


7.1 A rewriting system R to enforce strong consistency 


This section follows the development of Section 6.1. 


DEFINITION 13 (System R). The system R extends R with the 


additional rules of Figure 10, analogously to Definition 11. 


LEMMA 7 (Properties of R). J. R is iT-stable, that is 
TeEiTaA Te =>T'¢iT 
2. R preserves the semantics, that is 


D—D's=D=TD' 
R 
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proc Rne(T) 
1...5. 7 — WeakNF(T) 


6. for each S’ € S from the root to the leaves 


for each Q € Comp(S) 
for every S €Q 


try DnInf(S, S’,Q) 


7. for each S” € S from the root to the leaves 


for each Q € Split(S) 
for each S EQ 


try DnSup(S, S’, Q) 


8. foreach S €S 
for each Q € Split(S) 
try CCmp(S, Q) 
9. for each S € S 
for each Q € Comp(S) 
try CSplit(S, Q) 


10. for each S € S from the leaves to the root 


for each Q € Comp(S) 
try UpPhi(S, Q) 


11. for each S € S from the leaves to the root 


try Void(S) 
12. foreach S ES 
for each Q € Comp(S) 
try Equal(S, Q) 
return J 


Figure 11. Polynomial-time algorithm Ryr(Z) to compute R 


normal form 


3. R enforces strong consistency when possible, that is, a diagram 
D in R normal form is either equal to La or it is strongly 


consistent. 


4. R terminates in polynomial time for the strategy corresponding 
to the algorithm Rue described in Figure 11. 


Proof sketch. 


1. The iT-stability is trivial for the rules DnInf, DnSup, 
UpPhi. The other rules are marked with a star and we use the 
algorithm Simplify. In fact, we can show that it is not necessary 
to apply Simplify in its full generality, but only to remove any 
redundant views introduced by CCmp and CSplit, remove 
any self edges introduced by the operation Merge used in the 
rules Equal and Void, and to remove the edges going to 0a 
that can be introduced by the rule Void. 


2,3. Follow by construction as in the previous section. 
4. This part is significantly more difficult than for system R”, 


because the interactions between the rules are more complex, 


but follows the same structure as the proof for R”’. 


7.2 Extracting Information from Strongly Consistent I-Trees 


In this section we start from a strongly consistent i-tree TJ and con- 
sider the problem of checking T / D’. Analyzing Definition 2, 
we observe that a diagram corresponds to a conjunction of con- 


straints. Therefore, the subsumption problem T 


LK D’ corresponds 


to the problem of verifying that T entails atomic formulas of the 


form s = 0, s1 = 82, 81 C s2,a < |s| < B, 


sC P,s Cc P*, 


81 s2 = Wands C U{si,.. 


., Sn}. Without the danger of con- 


fusion, we write J -/ A when the atomic formula A holds in all 


models for T. 


THEOREM 4. Let T be a strongly consistent i-tree and let H4 for 
atomic formula A be as defined in Figure 12. Then T A if and 


only if H4. 


2005/8/3 


proc Subsumes(7, D’) 
T = Rye (T) 
let f : SN — S such that Vs € SN. s € f(s) 
let h’ : S’ + SN be any function such that VS” € S. h’(S’) € 9’ 
check all of the following conditions: 


ben. oi Hee, 


SES! 81,82€S 
def 
where ae => f(si) = f(s2) 
de 
where Him i f(s) =a 
3. A Heine(s)<|n(8)|<CSup’(S) 
Ses! 


where H2<ial<b et Cinf(f(s)) <a <b < CSup(f(s)) 


4 \ A H7 prncsy) 
SES! (4P)E8!(S) 


where HE peas fh, (+P) € ®(f(s)) 


5. A A H? prasy) 
SES! (—P)€#!(S) 
def 
where HH sexs a (—P) € ®(f(s)) 
6. A A Hasyencs’) 
S€S! S’€Sons’(S) 
de * 
where H? cx, fk, f(si) = 0a V f(s1) ~» f(s2) 
= T 
7 A A A Hi(s1)An(S2)=0 
SES! QESplit"(S) 5, s,Q 


SiAS2 


d 
where He aad ges, 


f(s1) = 9a V f(s2) =a V 
: dis) (51, S2) 
8 A A Hivsycunial 
SES! QEComp’(S) 
def 


where H2cyz <> f(s) = 0a V Included(f(s), f[Z],T) 


where proc Included(5So, C, 7) 
retun \/ Incl(S) 
Som Ss 
proc Incl(S) 
if S € C then return true 


detent’ 1 CA Inci(s’)) 


QEComp(S) \ S’EQ 


Figure 12. An Algorithm for Computing T —/ D’ for a an i-tree 
T and an arbitrary diagram D’. 


It is easy to verify that H4 implies J — A. The proof of the 
converse is based on the following two lemmas, which provide a 
link between strong and weak consistency. 


LEMMA 8 (Bounds Refinement). Let T be a strongly consistent i- 
tree, S € S, i,s such that Clnf(S) <i < s < CSup(S), let 
T’ = T[ClInf(S) — i, CSup(S) — s] and Typ = Ry (T’). Then 
1) Tie 1a, 2) Tie K T, and 3) if 7(S'~ So), then 


(Clnf (So), CSup(So)) = (Clnfive (So), CSupye (So). 


Proof sketch. 


1. We prove this result by induction on the depth of S' in the tree 
(S, ~»). The key step of this proof is to show that the application 
of UpSup and/or UpInf to the father S’ of S$ does not produce 
a situation where as holds in the resulting diagram J” (and 
therefore the rule Error is not applicable in J’). We use the fact 
that Ry applies the rules UpInf and UpSup bottom up, and 
prove that each application preserves Cs, only increases Clnf 
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and only decreases CSup. At each step we distinguish three 
cases: 


(a) both UpSup and Uplnf are applicable; then the result fol- 
lows from Cs; 

(b) only UpSup is applicable; then the result follows from Ce; 

(c) only UpInf is applicable; then the result follows from C7. 


2. Follows easily from the hypothesis CInf(S) <i<s < 
CSup(S) and the fact that Ry is semantics preserving. 


3. It is enough to notice that only rules UpInf and UpSup are used 
when applying Ry, and these rules are applied in the bottom- 
up direction. m 


The fact that the resulting i-tree Jyj- is not strongly consistent any- 
more prevents us to apply this lemma twice from a given strongly 
consistent i-tree. To enforce more than one restriction, we need to 
refine simultaneously the bounds of several nodes. For this purpose, 
we use the following lemma. 


LEMMA 9 (Parallel Bounds Refinement). Let T be a strongly con- 
sistent i-tree, and (Qo, ~~») a subtree of T such that 


e The nodes of Qo are pairwise independent, that is, 
VS1, S2 € Qo. 7(disj7(S1, S2)) 
© (Qo, ~) has the same root as T. 


Then the i-tree T’ defined by the simultaneous update 
T' “2 TVS € Qo:Clnf(S)—CSup(S) ] 


: 7 d a 
is such that its R” normal form Tyr = Ry (Z’) satisfies 


Lemmas 8 and 9 are the basic tools we need to show that 
the information syntactically computed from an i-tree is the most 
precise information computable from the semantics of the i-tree. 
We prove this property for each of the atomic formulas A. 


LEMMA 10. Jfani-tree T is strongly consistent, then for all S €S 
we have 


S#0as>I3M. MET AGm(S) £0 


Proof. If S 4 Qa, we have CSup(S) > 0 by C11 and therefore 


the i-tree T’ “! T ([Cinf(S)— max(1, Clnf(S))] subsumes T. By 
Lemma 8, 7’ is satisfiable, and we can take any model of T’ as a 
model of T. = 


LEMMA 11. Ifani-tree T is strongly consistent, then for all S € S 
we have 


IM.MET Alam(S)| = 
IM.MET Alam(S)] = 


Sup(S) 


Proof. According to Lemma 8, the two _ i-trees 


T “! T[CSup(S)—Clnf(S)] and 73“ T[Cinf(S)—CSup(S)] 
are satisfiable, and both T/ and 7; trivially subsume T. Any model 
My, of T/' is such that [@1, (S)| = Clnf(S'), and any model M2 
of Tz is such that |@4,(S)| = CSup(S). = 


LEMMA 12. If an i-tree T is strongly consistent, So € S, C € 
P(S), and Included(So, C, T) returns false, then 


IM. M—T A a@m(So) Z (Jamu lC] 


Proof sketch. Assume that Included returns false. We argue 
that the model M exists in several steps. Let Qo be the smallest 
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set of nodes such that: 


Som S +SEQo 
Se Qo ACL E Comp(S) 
AS, € Cy A7Incl(S;) = 51 € Qo 


By definition of Incl, we have Qo NC = 90. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We therefore compute a subtree Q of 
Qo, by starting from the root and keeping at most one son for each 
complete view. In this process we ensure that @1 contains So, by 
avoiding to cut the branch which leads to So. 

We then apply Lemma 9 to @, and construct a model of the 
resulting i-tree while enforcing that a certain element x € @(S) is 
such that  € a(S’) <= S$” € Q; forall S’ € S. More precisely, 
we prove by induction on n that for each node 5 of Q, of depth n 
in the tree Q1 we can construct a model (A, a, ©) for the sub-i-tree 
of J with root S$; such that 

VS’. S'S. > («@ € a(S’) & S’ EQ) 
If n = Othen Sj is a leaf of (Qi, ~~) and has no complete view by 
construction of Qi. Then using Cg we show that we can construct 
a model of the sub-i-tree with root S; containing a fresh element 
(not included in any of the sons of 51). 

If n > 0, we can deal with the split views in the same way, but 
this time S; can have some complete views. If this complete view 
contains a unique split view, we avoid merging x ina son of 5; with 
elements in the other sons of $1. If there exist more than one split 
view, we can use Cg and construct the model using a refinement of 
the ideas of Lemma 6. 

Finally, since So € Qi, we have x € @(So), and since 
CNQi =%we have x ¢ Ua[C]. = 


LEMMA 13. Ifani-tree T is strongly consistent, for all S,,S2 €S 
such that S; # 0 and Sz # 0 we have 


($1 + S2) > IM. MET AGm(S1) Z Gmui(S2) 


Proof. The property 3M. M — T A@mi(S1) Z G@ui(S2) can 
be checked using Included(1, {S2},7). Using Ci2 we show 
that this test is equivalent to test $1 ie So." 


LEMMA 14. Ifani-tree T is strongly consistent, for all S1,S2 €S 
we have 


8, 4 S2>35M. MET AGm(S1) 4 Om (S2) 


Proof. If S; 4 So, then =(S; w S2) or 7(S'2 ws S). In either 
case the result follows from Lemma 13. = 


LEMMA 15. [fan i-tree T is strongly consistent, then for all S € S 
and P € PN we have 


(+P) Z8(S) 4M. MET Aam(S) Z S(P) 
(—P) Z8(S) > 4M. MET Aam(S) Z S(P)° 


Proof sketch. Let S € S be such that (+P) ¢ ®(S). We define 


Qp ach ¢ gy € S\(+P) € 6(S’)}. Using Cio and C; we show that 


Included(S,Qp,7) returns false. By Lemma 12, there exists a 
model such that @(S') Z Ja[Qp]. We then change the model by 
redefining =’ on PN as =’(P) = Ua[Qr], so a(S) Z =’(P). 
The case (—P) ¢ (5) is dual and follows from the previous case 
by swapping (+P) and (—P) in the i-tree and taking complements 
of &(P). = 


LEMMA 16. /f an i-tree T is strongly consistent, then for all 
S1,S2 €S such that S; 4 Oa, S2 4 Da we have 


A(disj* (51, S2)) = 3M. ME T Aaa (S1) M&M (S2) 4 0. 
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From the previously stated lemmas, we can prove Theorem 4. 
From Theorem 4 and Lemma 7 we conclude that the algorithm in 
Figure 12 is a correct and complete test for subsumption, not only 
of between trees, but also between a tree and an arbitrary diagram. 


8. Related Work 


Boolean algebras with cardinalities. Quantifier-free formulas of 
boolean algebra are NP-complete [33]. Quantified formulas of 
boolean algebra are in alternating exponential space with a lin- 
ear number of alternations [21]. Cardinality constraints naturally 
arise in quantifier elimination for boolean algebras [31, 42, 43]. 
Quantifier elimination implies that each first-order formula of the 
language of boolean algebras is equivalent to some quantifier-free 
formula with constant cardinalities; however, quantifier elimina- 
tion may introduce an exponential blowup. The first-order theory 
of boolean algebras of finite sets with symbolic cardinalities, or, 
equivalently, boolean algebras of sets with equicardinality operator 
is shown decidable in [14]. These results are repeated, motivated 
by constraint solving applications, in [23, 39] and a special case 
with quantification over elements only is presented in [44]. Upper 
and lower bounds on the complexity of this problem were shown in 
[22] which also introduces the name BAPA, for Boolean Algebra 
with Presburger Arithmetic. The quantifier-free case of BAPA was 
studied in [45] with an NEXPTIME decision procedure, which is 
also achieved as a special case of [23, 22]. The new decision pro- 
cedure in the present paper improves this bound to PSPACE and 
gives insight into the problem by reducing it to boolean algebras 
with binary-encoded large cardinalities, and showing that it is not 
necessary to explicitly construct all set partitions. 

Several decidable fragments of set theory are studied in [10]. 
Cardinality constraints also occur in description logics [5] and 
two-variable logic with counting [35, 19, 38]. However, all logics 
of counting that we are aware of have complexity that is beyond 
PSPACE. 

We are not aware of any previously known fragments of boolean 
algebras of sets with cardinality constraints that have polynomial- 
time satisfiability or subsumption algorithms. Our polynomial-time 
result for i-trees is even more interesting in the light of the fact that 
our constraints can express some “‘disjunction-like” properties such 
as A= BUC. 


Set constraints. Set constraints [1, 3, 2, 6] are incomparable to 
the constraints studied in our paper. On the one hand, set constraints 
are interpreted over ground terms and contain operations that ap- 
ply a given free function symbol to each element of the set, which 
makes them suitable for encoding type inference [4] and interproce- 
dural analysis [20, 34]. Researchers have also explored the efficient 
computation of the subset relation for set constraints [13]. On the 
other hand, set constraints do not support cardinality operators that 
are useful in modelling databases [40, 11] and analysis of the sizes 
of data structures [29]. Tarskian constraints use uninterpreted func- 
tion symbols instead of free function symbols and have very high 
complexity [18]. 


9, Conclusions 


Constraints on sets and relations are very useful for analysis of soft- 
ware artifacts and their abstractions. Reasoning about sets and re- 
lations often involves reasoning about their sizes. For example, an 
integer field may be used to track the size of the set of objects stored 
in a data structure. In this paper, we have presented new complexity 
results and algorithms for solving constraints on boolean algebra 
of sets with symbolic and constant cardinality constraints. We have 
presented symbolic constraints and large constant constraints, gave 
more efficient algorithm for quantifier-free symbolic constraints, 
identified several sources of NP-hardness of constraints, and pre- 
sented a new class of constraints for which satisfiability and en- 
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tailment are solvable in polynomial time. We hope that our results 
will serve as concrete recipes and general guidance in the design of 
algorithms for constraint solving and program analysis. 
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A. Proofs 
A.1_ I-Diagrams 


Lemma 3 J/-diagrams have the same expressive power as CBAC 
constraints. 


Proof. We translate an i-diagram into a CBAC constraint as fol- 
lows. As in Figure 2, we note that b} = be, b1 C be can be ex- 
pressed in the form |b| = 0, so we may assume that they are part of 
CBAC. Similarly, |b| < & can be expressed as b C s A |s| = k 
for a fresh variable s, and |b] > k can be expressed as s C 
b A |s| = k. We translate |g into e.g. |O| = 1. Next consider D = 
(S, Oa, Sons, Split, Comp, Clnf, CSup, ®). For each S € S, let 
n(S) € S be a representative set name. For each S$; € S' \ {n(S)} 
introduce conjunct S; = n(S). Next, for each S; € Sons(S), in- 
troduce a conjunct S; C S. For each +P € ®(S), introduce con- 
junct S' C P, and for each —P € ®(S) conjunct S C P*. Express 
the bounds using conjuncts |S| < CSup(S’) and |S| > Cinf(S). 
For each Q € Split(S) and S1,S2 € Q where S; # So, intro- 
duce conjunct |S M S2| = 0. For each {S1,...,5n} € Comp(S) 
introduce conjunct S = S;U...USp. 

We translate a CBAC constraint into i-diagram using the follow- 
ing observations. It is sufficient to translate the following boolean 
algebra expressions: so = $1 U S82, $9 = sf, and |s| = k. We 
construct an i-diagram whose nodes are singletons. We pick one 
set variable u to act as a universal set and put {s} € Sons({u}) 
for every set variable s in the i-diagram. We translate so = s1 U 
so as {{s1}, {s2}} © Comp({so}) and translate so = s{ as 
{{so},{si}} € Split({u}). {{so},{s}} € Comp({u}). We 
translate |s| = k as Clnf({s}) = & and CSup({s}) = k. Then 
for each satisfiable assignment of CBAC there is a model for the 
constructed i-diagram where {u} is interpreted as a universal set. 
Conversely, for a model of i-diagram where @({s}) = A, we let 
[s + AM @(u)]. The result is an assignment that satisfies the orig- 
inal CBAC formula. = 


Lemma 4 For every i-diagram D we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm in Figure 6. 


Proof. We argue that algorithm in Figure 6 produces diagram that 
is i) well-formed, ii) simple, iii) equivalent to the original diagram. 

We first observe that after step 2, the following two conditions 
hold: 


C) if Q € Split(S') and S € Q, then Q C {S, Qa}. This condition 
holds because the step 1.5 of the algorithm merges all nodes 

Q \ {S$} with aq when S € Q € Split(S). 

D) if Q € Comp(S), then Q 4 0 (by step 1.3) and if |Q| = 1 then 

Q = {S} (by step 1.6). 

i) To see that the resulting i-diagram is well-formed, it suffices 
to check the conditions LJ Split(‘S) = Sons(S'}) and |) Comp(S) C 
Sons(S). This condition is preserved by factor-diagram construc- 
tion (for any equivalence relation). It is preserved by step 3 for the 
following reason. The only nodes removed from Sons($) are 0a 
and S. These nodes do not appear in (J Comp(S’) because Oa is 
removed from each view Q, and views with S € @ are removed. 
It remains to check that Sons(S') C ) Split(S') after step 3, and 
this holds because our condition (C) implies that no element other 
than S,0q is lost from LJ Split(S) in step 3. The well-formedness 
condition is preserved by step 4 because this step does not change 
U Comp(S) or LU Split(S). Step 5 does not violate this condition 
either because it sets the components of 0, to 0. 

ii) To see that the resulting diagram is simple, we show that 
it satisfies conditions a),...,f) of Definition 6. After step 2 of the 
algorithm, the resulting factor-diagram has no cycles of length 2 
or more, there are only potentially some self-cycles. These are 
eliminated in step 3 and no further edges are introduced. Hence, 
a) holds. For each of the following condition, they are enforced in 
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certain step and not violated afterwards, according to the following 


table: 
LTO ale {f | 


iii) We show that semantics is preserved when executing each 


sequence of steps 1,...,k for 2 < k < 5, that is, each step pre- 
serves the semantics provided that it is executed after the previous 
steps. 


k = 2. Each equality introduced into p is a semantic conse- 
quence of the diagram, because 


1.1 a(Si C So and @(S2) C a(S1), 

1.2 B51) C H(0a) = 0, 

13 a(Si) CUO=9, 

1.4 a(S1)N @(S2) = 0 for &(S1) 

1.5 @($1) N@(S2) = O, for a(S) 
so again a(S'2) = 0, 

1.6 a(S1) C Sand a(S) C U{a(Si)} = a(S1). 


It follows that the condition on equality of sets, as well as the con- 
ditions on Clnf, CSup, Sons, ®, Comp are all semantically equiv- 
alent when applied to the original and the factor diagram. The only 
semantic condition which can be lost in factor-diagram construc- 
tion is disin.s, ($1, S2) when S; and S2 nodes are merged, that is, 
when (51,52) € p. However, in this case the disjointness condi- 
tion follows from @(.S1) = @, which is enforced in 1.3. Therefore, 
for the particular relation constructed in step 1, factor-diagram is 
an equivalence preserving transformation. 

k; = 3. We need to show that no information is lost by removing 
Qa and S from the sons, as well as split and complete views of 
S. Clearly, removing S and Qq from Sons(S') does not change 
the subset conditions because # C a(S) and a(S) C a(S). 
Eliminating 0g from Q € Comp(S) is justified because the view 
has the same semantics with or without 0g. Dropping a view Q € 
Comp(S) for S € Q is justified because in that case a(S) C 
Us, eq @(51) holds trivially. Eliminating 0g from Q € Split(S) 
is justified because intersection with empty set is always empty, 
so this condition does not bring any new information. Finally, 
dropping aQ € Split(S) with S € Q is justified because condition 
(C) implies that in such case Q C {a, S} so the Split condition is 
trivial. 

k = 4. Removing {Qa} from Split(S) preserves semantics 
because such view carries no information. Similarly, because all 
maximal views are preserved, removing their subsets does not 
change the semantics. For Comp(.S), we consider two cases. In first 
case ) ¢ Comp(S). In this case, removing does not have any 
effect, and it is sound to remove all non-minimal views because 
they are implied by the minimal views. The second case is ) € 
Comp(S). By condition D) on the step 1, we know that Q # 0 after 
the step 2, and the only node removed in step 3 is 0a, so it must have 
been the case that Q = {Qa} after step 2. By condition D), we then 
have S = (qa. Because the semantic condition on Comp for Q = 0 
reduces to @(S') = Q, this condition brings no new information, so 
we can remove it. 

k = 5. Because @(02) =, CSup(0z) = 0 does not change 
semantics, similarly for ®(0z) = 0. We also know that Sons(S) C 
{Qa} because this condition is ensured by step 2 and is not violated 
afterwards. Because we have already observed that the diagram 
is well-formed, we conclude Comp(S) C {@a} and Split(S) C 
{Qa}, so setting these values to @ does not change the semantics. m 


(S2) so @(S2) = @, or 
(So), and a(S2) Cc a(So), 


=a 
=a 


Theorem 2 Omitting any one out of three conditions from Defini- 
tion 9 (1. being tree-shaped, 2. having independent views, and 3. 
having independent signatures) yields a class of diagrams whose 
satisfiability is NP-hard. 


Proof. Suppose that at least one of the three conditions does not 
apply to a class of i-diagrams. We then give a reduction from the 
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problem 3COL to the satisfiability of i-diagrams in this class. Here 
3COL denotes the NP-complete problem of deciding, given an 
indirected graph, whether the graph can be colored using 3 colors 
such that adjacent nodes have different colors [41, Page 275]. 

Given a graph (N, £) where E C N x N is a symmetric ir- 
reflexive relation, we first build the i-diagram D defined as paees 
S = NU {U}, Clnf(U) = CSup(U) = 3, Sons(U) = 
Split(U) = {{n}|n € N}, and Comp(U) = ®(U) = 0. F 
alln € N we let Clnf(n) = CSup(n) = 1 and let Sons(n re 
Split(n) = Comp(n) = ®(n) = 0. 

Each model of this diagram is a triple (A,a,=) such that 
[a(U)| = 3 and for all n, @(n) is a singleton included in a(U). 
If we consider @(U) as a set of three colors, then in each model 
with this property, @(n) indicates the color of node n. 

Then for each edge (n1,n2) € E we encode the fact that 
m1 and nz must have different colors by enforcing the property 
@(n1)N&(n2) = on the models of D. We encode this constraint 
in different ways depending on the class of i-diagrams: 


e If D allows dependent signatures, we introduce a fresh pred- 
icate symbol Phijn., and add (+Prj,n.) to ®(mi), and 
(—Pnjz,nz) to ®(n2). 

e If D allows dependent views, we add {n1, n2} to Split(U). 

e If D allows multiple fathers, then we simulate depen- 
dent views by introducing a new node m(ni,nz2). 
We let Clinf(m(ni,n2)) = CSup(m(ni,n2)) = 2, 
Sons(m(ni,n2)) = {ni,n2}, Split(m(ni,ne)) = 
Comp(m(ni,n2)) = {{ni,na}}, and ®(m(ni,n2)) = . 
We then remove 71,72 from Sons(U), and add m(ni,n2) to 
Sons(U) instead. 


None of these constructions violates more than one of the three con- 
sidered restrictions. It is straightforward to verify that the diagram 
is satisfiable iff the graph is colorable, and that the construction of 
D can be done in polynomial time. This proves that the satisfia- 
bility of i-diagrams with any of the three restrictions removed is 
NP-hard. = 


A.2 Termination of System 


LEMMA 17 (Invariants of R). For every k € [1..12] the rule Ry 
preserves \ 1. (,—1) Cj or returns La. 


Proof. We analyze each rule Rj, for two i-trees T and T’ such 


that J —+T’ , assuming that T satisfies /\ C;, and, 


Ry j=1..(k-1) 


more precisely, T —> i T’, where spot are variable names as they 
appear in the definition of R. 


1. (DnPhi). Trivial. 

2. (Unsat). (C1) does not depend on CSup. 

3. (UpSup). If n = 0, (C2) is trivialy true for S in D’. If 
n > 0, by (c3) CSup(S) > 0 and we have already VP € 
PN.{+P, —P} Z ®(S) and since ®’ = ®, (C2) holds in T’. 

4. (UpInf). Neither (C1),(C2) nor (C3) depend on Clnf. 

. (Error). TJ’ =14 

6. (DnInf). Only (C4) and (Cs) depends on Clnf. 

¢ (Cs) is maintained for S in T’ because, noticing that (Cs) is 
maintained for S’ # S, we have 


Nn 


Cinf’ (5) = Clnf(.S’) — ©(CSup(Qo)) 
< CSup($’) — U(CSup(Qo)) (by Cs) 
< CSup(S) (by Cs) 


e To prove that (C4) is maintained in T’ we need to check 
that (C4) is maintained for S, which is trivial by (c6), and 
that (C4) is maintained for the father S’ of S and the views 
Q € Split(.S’) containing S. By property of independent 
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({S} 8 Qo) 


views there exists only one such view Q = 
such that Qo C Qo and 


SCInf’ (Q) = Cinf/(S) + ECInF (Qh) 
= (Cinf(S")—¥CSup(Qo))+ECInF ($5) 
= CInf(S")—UCSup(Qo — Qo) 
+2 sp eQh (Clnf(So) — CSup(So)) 
< Cinf(S’)—ECSup(Qo — Q) (by Cs) 
< Clnf(S’) 
= Cinf’(S") 


7. (DnSup). Only (C2), (C3) and (Cs) depend on CSup. (C2) is 
maintained thanks to (c7) as for the case of rule UpSup. 


© (Cs) is maintained for S in T’ because, noticing that (Cs) is 
maintained for S’ 4 S' we have 


CSup’(S'}) = CSup($’) — X(ClInf(Qo)) 
> Clnf(.S’) — X(Clnf(Qo)) (by Cs) 
> Cinf(S) (by C4) 


¢ To prove that (C3) is maintained in T’ we need to check 
that (C3) is maintained for S, which is trivial by (c7), and 
that (C3) is maintained for the father S’ of S and the views 
Q € Comp(S’) containing S. By property of independent 
views there exists only one such view Q = ({S}WQo) and 


=CSup’(Q) = CSup’(.S) + UCSup(Qo) 
= (CSup($”) —UCInf (Qo))-+=CSup(So) 
= Clnf(S") + ©s,€Q, (CSup(So) — Clnf(So)) 
>Cinf(S’) (by Cs) 
= CInf’(S’) 


8. (CCmp). Only (C3) and (Ce) depend on Comp. 
e (C3) is maintained for S, Q because 


CSup’(S) = CSup(S) 
<U(Cinf(Q)) (by as) 
<U(CSup(Q)) (by Cs) 
= X(CSup’(Q)) 


© (Ce) is maintained for S, Q and all So € Q because 


Cinf’(S) = CInf(S) 
< CSup(S) (by Cs) 
< X(ClInf(Q)) (by as) 
= Cinf(So) + U(Cinf(Q — {So})) 
< Clnf(So) + U(CSup(Q — {So})) (by Cs) 
= ClInf’(So) + (CSup’(Q — {So})) 


(Remark). If ever we use a simplification afterwards, as indi- 
cated by the star in figure 10, it can only consists in removing 
a complete view Q’ such that Q’ ¢ Q. This operation trivially 
maintains Aje1.. (7) C; because in every properties of consis- 
tency where complete views appear they are universally quan- 
tified. 


9. (CSplit). Only (C4) and (C7) depend on Split. 
© (C4) is maintained for S, Q because 


Clnf’(S) = Clnf(S) 
= X(CSup(Q)) (by as) 
2X(Cinf(Q)) — (by Cs) 
= }(CInf’(Q)) 
© (C7) is maintained for S, Q and all So € Q because 
CSup’(S) = CSup(S) 
> Cinf(S) (by Cs) 
> &(CSup(Q)) (by ag) 
= CSup(So) + &(CSup(Q — {So })) 
2 CSup(So) + &(Cinf(Q — {So})) (by Cs) 
= CSup’(So) + =(Clnf’(Q — {S0})) 


14 2005/8/3 


(Remark). If ever we use a simplification afterwards, as indi- 
cated by the star in figure 10, it can only consists in remov- 
ing a split view Q’ such that Q’ ¢ Q. This operation trivially 
maintains mn jri.8 C; because in every properties of consistency 
where split views appear they are universally quantified. 


10. (UpPhi). Only (C;) and (C2) depend on ® 


e (Ci) is maintained for S' and all S’ € Q because 


(S$) = (8) UM &(Q) (by bio) 
= (5) U (8(S") 1) ®(Q — {S"}) 
C &(S) U ®(S’) 
Co(s" (by C1) 
= '(S’) 


e We can also prove that (C2) is maintained for S. Suppose 
there exists P in PN such that {+P,—P} € ®’(S). Then 
we distinguish three cases. 


"if {+P,-—P} C &(S), by Co, CSup(S) = 0 

= if {+P, —P}N®(S) =, then {+P, —P} € (1) ®(Q). 
Then for all S’ € Q, CSup($’) = 0 by C2. Then by Cs, 
CSup(S') = 0 

eif {+P,-—P}M ®(S) = +P for one atom +P € 
{+P,—P}. Then the opposite atom +P belong 
() ®(Q) and by C1, +P belong to () ®(Q). By C2, each 
node S’ € Q is such that CSup(.$’) = 0 and by Cs, 
CSup(S) = 0 


In the three cases CSup’(S') = CSup(S) = 0. 

11. (Void). If S is the root,the resulting i-tree T’ is such that 
S’ = {SN} = {Oa} and C; is trivial for all ¢ € [1..11]. 
Otherwize, we have to check that removing the node S from 
the sons of the father of S (as indicated in the step 3 of the 
procedure simplify) maintains /\ j=1..10 Cj- We denote by So 
the father of S and by Qo the split view of So containing S. If 
S is also contained in a complete view of So we denote by Co 
this complete view. 


@ (C,) and (C2) are trivially maintained. 


¢ (C3) is maintained because, if C’o exists and C6 = Co—{S} 
is not empty 
CSup’(So) = CSup(So) 
<ZCSup(Co) (by C3) 
= SCSup(Co) (by ai2) 
= )CSup’ (C6) 


e (C4) is maintained because, if Qo = 


Clnf’ (So) = CInf (So) 
> DCInf (Qo) 
> BCInf (Qh) 
= DCInf’(Q) 


Qo — {S} is not empty 


(by Ca) 


e (Cs) is maintained for the node @’, = S U a of T’ because 
by (ai12) and (Cs), Clnf(S') < CSup(S) = 0, by simplic- 
ity and (C5), Clnf(@z) < CSup(@az) = 0, and therefore 
CInf’(0/,) = Max(Clnf(S), Clnf(@a)) = 0 < CSup’ (0). 

e (Ce) is maintained for So, Co and every S; € Co such that 
S1 4 S because 


Cinf’(S1) = Cinf(S 1) 
> Cinf(So)—(CSup(Co—{S1 })) (by Ce) 
= CInf(So)— X(CSup(Co—{S}—{S1})) (by aiz) 
= CInf’(So)—H(CSup’(C6—{S1})) 
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© (C7) is maintained for So, Qo and every 5; € Qo such that 
S; 4 S because 


CSup’ ($1) = CSup($1) 
< CSup(So) — &(Clnf(Co — {S1})) (by C7) 
< CSup(So) — &(Clnf(Co — {S} — {$1})) 
= CSup’(So) — &(Cinf’(C4 — {51})) 
e (Cg) is maintained for Qo because 
CSup’ (So) = CSup(So) 


> X(Cinf(Qo)) 
> X(CInf(Qo — {So0})) 
= X(CInf’(Qo)) 
© (Co) is maintained for Co (if exists and Ch = Co — {So} 4 
0) because 


Cinf’ (So) = CInf(So) 
< &(CSup(Qo)) 
= 4(CSup(Qo — {So})) 
= &(CSup’(Qo)) 
© (Cio) is maintained for Co (if exists and C§ = 
0) because 


M®"(Co) = ®(Co — {So}) 
C(] ®(Co) 
C B(So) 
= 9'(So) 
12. (Equal). Before to merge S and S’ for {S’} € Comp(S) we 
have 
© ClInf(S) = Clnf(S’) by Ca and Ce 
© CSup(S) = CSup($’) by C3 and C7 
© (9) = &(S’) by Ci and Cio 
Therefore all the properties C; for i = 1..10 are trivially 
maintained. 


( by Cs) 


( by Co) 
( by ai1) 


Co—{So} # 


( by Cio) 


A.3. Model Construction 


Lemma 6 (Model Construction) [f an i-tree T is weakly consis- 
tent, then we can construct a model for T. 


Proof. Let T be a weakly consistent i-tree. 

We construct a model (A, a, =) by first constructing a partial 

model (A,q@) for all parts of T except ®, and then extending 
(A, a) with & to satisfy ®. 
Constructing (A, a). We write (A, a) — T to denote that A and 
a satisfy those conditions on S, Sons, Split, Comp, CSup, Clnf that 
do not mention © in Definition 2. To show we can construct (A, a) 
such that (A, a) E JT we prove by induction on n the following 
more general claim. 


CLAIM |. For every i-tree T of height n with root Sr: 


Vk € [CInf(Sr), CSup(Sr))]. 
F(A, a). (Aja) ED A |a(Se)| = 


If n = 1, the claim holds by Cs taking 
A=@(Sr) = {1,...,k} 


For n>1, consider an i-tree TJ with root Sr and k € 
[CInf(Sr), CSup(Sr)]. By examining the constraints in TJ, we 
choose the cardinalities for subtrees of J, use the induction hy- 
pothesis to construct models for subtrees, and paste the models for 
subtrees into a model for J. We decompose this process into three 
steps: 


|A| =k 
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1. For each C € Comp(Sp), consider a subtree Tc built from T 
by removing all sons outside |) C’. We construct a model Mc 
for Tc of cardinality k. 

2. For each remaining Q € Split(Sr) where Q Z J Comp(SaR), 
consider a subtree JQ with the same root Sz but without the 
sons outside (J Q. We construct a model Mg for Tc of cardi- 
nality k. 

3. Because the constructed models have the same cardinality, we 
can easily merge them to obtain a model for T. 


Step 1. Let C € Comp(Sr), and C C Split(SR) such that 


de 


C = UC. For each Q € C, let tg ef min(k, &(CSup[Q])). Using 
k > Clnf(S'z) and C4 we can show 


¥(CInf[Q]) < to < 3(CSup[Q]) (H1) 


To each node S € @ we can therefore assign an integer K(S) € 
[CInf(S), CSup(S)] such that tg = (K[Q]). Let Ts be the sub- 
i-tree of T rooted at S. By induction hypothesis, let Ms be the 
model of Ts of cardinality K(S). We can then take the disjoint 
union of these models to construct a model M Mg of size tg for 
the forest U seg Ts. 

For all Q € C, we have tg < k by definition of tg and k. 
We can also prove that k < Ugectg. Indeed, if there exists a 
Qo € C such that tg, = k, this is trivial. Otherwise, because T 
is weakly consistent, from C3 we can show that k < CSup(S) < 
&(CSup[C]) because 


E(CSuplC}) = 3 ((CSup[Q))) = Z. (ta): 


oe 
We finally obtain 
maxto <k< Ht H2 
ees eS ye (#12) 


Thanks to (H2), we can build a model for the i-tree Tc, as follows. 
We start with the disjoint union of models Mg for Tg for Q € C. 
This model has cardinality Ngectg. Then, we rename elements 
from different models to be identical to elements from other mod- 
els. Such merging is possible as long as there is no model whose 
domain contains the domains of all others, so we can reach any 
cardinality k for maxgec < k. 


REMARK 2. (Freedom in the choice of {ti }je[1..n)) In Section 7 
we enforce some additional properties on models using a differ- 
ent choice of tg. Such construction is possible whenever t; satisfy 
(1) and (2). Moreover, if K (.S') denotes some chosen cardinal- 
ity for each node S, and the values K(S’) satisfy certain assump- 
tions, then we can enforce additional properties when merging the 
models Mg corresponding for Q € Split(S). The following two 
cases are of interest. 


1. If $> tg > K(S), we can chose any pair of different split 
Qec 
views Q1,Q2 € C, and two elements x; from Mg, and x2 
from Mg, and decide to merge them. 


2. If C = {Qo} W Co and max tg < K(S), we can chose any 
0 


element in the model Mg, and decide not to merge it with any 
of the elements of the models Mg for Q’ € Co. 


Step 2. Let Q € Split(Sr), such that Q is not included in any 
complete view. We construct a model Mg of size k for the i-forest 
Tr by first building a model of size K(.S”) = Clnf(S’) for each 
S" € Q. Because k > Clnf(S) > X(Clnf[Q]), by Ca, the disjoint 
union of these models has cardinality smaller than k. By adding 
the correct number of fresh elements to A, we obtain a model of 
cardinality k. 


REMARK 3. (Existence of fresh elements) In Section 7 we use the 
following property: For all Q € Split(S) and Q ¢ UComp(S), 
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if 4(Clnf[Q]) < A(S), then there exists a model such that a(S) 
contains an element which does not belong to any a(S’) for any of 
the sons of S. 


Step 3. We can apply an arbitrary bijection ac : Ac = [L..n] 
to each model Mc constructed as previously described before to 
build a model for the entire i-tree. We let @(S'z) = [1..n] and for 
all S # Sr, AS) = Ac(S) where C is the view containing an 
ancestor of S' in Split(Sr) or Comp(Sr). 


REMARK 4. (Freedom in the choice of o.) If we know that 
K(S) > 0, for any pair $1, S2 of sons of S such that 51, S2 belong 
neither to the same split view nor to the same complete view, for 
each choice of elements 71, x2 in the models Ts, and Ts, we can 


choose o1, 02 such that o1(21) = o2(x2) wef x and the resulting 
i-tree will be such that x € @(S1) N@(S2). 


Extending the model with =. Let (A,a) — 7. Then for each 
P © PN, define 


B(P) ={a(s) |S eSAP € &(3S)} 
Then (+P) € ®(S) => a(S) C &(P) holds by construction, 


it remains to show (—P) € ®(S) = a(S) C E(P)° for every 
node S. Consider S; € S such that (—P) € (91). If Si = Oa, 
then @(S1) = 0, so the condition trivially holds. Similarly, if 
(+P) € ®(S1), then by C2, CSup(Si) = 0 so @(S1) = @ and the 
condition holds. Otherwise, assume (+P) ¢ ®(S1). For the sake 
of contradiction suppose that there exists an element x € @(.51), 
x € S(P). By definition of =(P), there exists a node Sy 4 $1 
such that « € @(S2) and (+P) € (52). By the condition on 
independent signatures, one of the followig two cases applies. 


1. disj} ($1, S2). Then @(S1) M @(S2) = @ by the semantics 
of i-diagrams, which is a contradiction with x € @(Si) and 
x Ee a(S2). 

2. S; and S2 have compatiable signatures. Then there exists a 
node S' such that S; ~+ S, S2~+ S and Sig(S1) M Sig(S2) C 
Sig(S). Because (—P) € (Si), (+P) € Sig(Si), and 
because (—P) € ®(S2), P € Sig(S2). Therefore P € Sig(S). 
We have two cases: 


(a) (+P) € ®(S). By Ci, then (+P) € ®(S1), a contradic- 
tion. 

(b) (—P) € ®(S). By Co, then (—P) € ®(S2). By C2 then 
CSup(S2) = 0, so @(S2) = 0, a contradiction with x € 
a(S2). 

We have reached the contradiction in each case, so we conclude 
a(Si1) C E(P)°. = 


A.4 Details of the Proofs for Subsumption Completeness 
A.4.1_ Refinemenents of Lemma 6 


According to the remarks in the proof of Lemma 6, if an i-tree T is 
weakly consistent, there exists a choice of cardinalities kK :S — N, 
such that we can build a model (A, a, =) for T with the property 
[a(S)| = K(S) for all S € S. Fora fixed choice of cardinalities 
KK, we can, in certain cases, enforce some additional properties 
by choosing which element we merge in the steps 1 and 3 of the 
construction. The three following Lemmas are based on this idea. 


LEMMA 18 (Non-empty intersection (1)). Let $1 and S2 be nodes 
in a weakly consistent i-tree T is such that 


Cinf(S1) >OA Si SEAS, EQiA 
ClInf(S2) > 0A S2~> S5A SS € Qa 
for some S, 51,55 € S, Qi, Q2 © Split(S) where Qi # Q2 and 


=A(SC € Comp(S). Q1 © CA Qe C C). Then there exists a 
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model (A, a, =) for T such that 
a(S1) Na(S2) 4 0. 


Proof. We use the construction described in Lemma 6, with the 
exception of step 3 of the construction of the model Mg for the 
subtree Ts with root S, where we do the following: 


© We choose an element x1 in @(S1) in the model M ¢, built for 
Ts: (we know there exists one such x; because Clnf(S) > 0) 

e Analogously, we choose an element x2 in @2 (S2) in the model 
Ms, build for Ts, 

e We choose the bijections og, and og, in step 3 such that 
OQ, (£1) = 0Q2 (£2). 


LEMMA 19 (Non-empty intersection (2)). Let Si and S2 be nodes 
in a weakly consistent i-tree T is such that 


Cinf(S1) >OA Si SL ASL EQiA 
CInf(S2) > OA Sy 85 A S$ € Qo 


for some S, 5,55 € S, Qi, Q2 € Split(S) where Qi: 4 Qo, and 
Qi CC, Q2 C C for some C € Comp(S) with the property 


CSup(S) < &(CSup[C]). 
Then there exists a model (A, a, =) for T such that 
a(S1) NAa(S2) 4 0. 


Proof. We use the construction described in the proof of 
Lemma 6, except for the step 1 of the construction of the model 
Mg for the subtree Ts with root S, for which we do the following. 
From K(S) < CSup(S) < h(CSup[Q]), we conclude that the 
choice of cardinalities tg for @ € C in the proof of Lemma 6 is 
such that Ugectg > K(S), by considering two cases. 


1. There exists Q € C such that 
tg = min(K(S), &(CSup[Q])) = K(S). 
Then we choose Q; € {Q1,Q2} such that Q; 4 Q. Since 
Clnf(S1) > OA $1 ~~ S{ ~» S, repeatedly applying Ca and 
using C5, we have 
¢ (CSup[Qi]) > CSup($j) > Clnf(S;) > Cinf(S;) > 0 
e K(S) > Clnf(S) > Clnf(S;) > 0 
As a consequence tg, = min(K(S), §(CSup[Q;])) > 0 and 
Neecte > tet+ta, > K(S). 
2. For all Q € C, tg = U(CSup[Q])). Then Ngectg = 
%(CSup[C]) > CSup(S) > K(S). 
Because Ngectg > K(S), we can apply Remark 2 and choose 
one element 21 in @(S1) in the model Ms: built for Ts, and 
an element x2 in @2(S2) in the model built for Ts1 and decide 
to merge these elements in the step 1 of the construction. We 


know that such elements x1, 72 exist because Clnf(Si) > 0 and 
Cinf(S2) > 0." 


LEMMA 20 (Isolated element). Let T be a weakly consistent i- 


tree and (Q1,~~) a subtree of (S,~+) with the same root Sr as 
T, such that for all S € Q, all the following conditions hold: 


1. Clnf(S) > 0 
2. VC € Comp(S). 3='Q € Split(S).Q CCA|QNQi|=1 


3. VQ € Split(S). |QNQi)/41 => 
(IQA Qi| =0A X(Cinf[Q]) < Clnf(S)). 


Then we can construct a model for T such that 


jz € a(R). VS ES. (a € a(S) 


S€Q1) 
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Proof. We use a variation of the construction in the proof of 
Lemma 6. We apply the assumptions about the subtree @ to show 
that we always have enough “slack” to avoid merging one specific 
element from Q1 with the elements of neighbors. We being by 
describing a slightly modified Step 1 of the proof of Lemma 6. 

Step 1’ (for nodes of @:). Consider the root Sr € Q1.. Let 
C € Comp(SpR), and C C Split(Sr) such that C = UC. By 
construction, there exists a unique son S’ € Qi of Sp and a 
corresponding split view Q’ such that S’ € Q’ and C = {Q’}WCo. 
We define tg, = min(k, U(CSup[Q’])) and for each Qo € Co, we 


define ta, af min(k—1, =(CSup[Qo])). For each Qo € Co, since 
k > Clnf(Sp) by choice of k and Clnf(Spz) > X(ClInf[Qo]) by 
hypothesis 3 on T, we have (Clnf[Qo]) < k—1, so 


3(CInf[Qo]) < toy < E(CSuplQo]) (H1) 


and the property (H1) also holds for tg, because tg, is defined as 
in Lemma 6. 
By definition of t we clearly have maxgectgq < k. We next 
show ee tag = k by considering the following cases. 
€ 


¢ tg: =k. Then the claim is obvious. 

e For all Q € C we have tg = 4(CSup|[Q]). The claim follows 
from C3 and the choice of k because Mtg > U(CSup[C]) > 
CSup(Sr) > k. 

e There exists Qo € Co such that tg, = k — 1. Using 
d(CSup[Q’]) > CSup(S’) > 0 and k > Clnf(Sr) > 0, 
we obtain tg: > 0, so 


x tag >tatta >(k-1+1>k. 
Qec 


We finally obtain 
maxtg <k< »& te H2 
QeECc Pa QeEC ( ) 


By definition of all tg, we then have 
VQo0 € Co. tag <k (H3) 


According to Remark 2, H3 allows us to choose an element of the 
model M ¢, constructed for the subtree Ts, and decide not to merge 
it with any other element. This observation allows us to recursively 
enforcer € @(S) — > SEQ1. 

Indeed, consider a node Sp € Qi and let {SR,..., 5%} = 
Sons(Sr) M Qi be its sons in Q1. For each 7, we can then recur- 
sively ensure x; € a(S) <= S € Q, foreach S in the SR 


subtree i.e. for each S' for which S > S%. By definition of Q1, 
each S%, is in a different complete view, so we can apply bijection 
to the submodels (Remark 4) and let o;(41) =... = Op(%p) = x. 
We ensure that x does not belong to any subtree rooted at a node 
So € Sons(Sr) \ Q1, using Remark 2 to make sure that x is not 
merged with any of the elements of @(.So), which is possible thanks 
to H3. Finally, for the base case, when S has no sons, we pick x to 
be a fresh element, which is possible by assumption 3 on the sub- 
tree Q1, as noted in Remark 3. m 


A.4.2 Links between weak and strong consistency 


Lemma 8 (Bounds Refinement) Let T be a strongly consistent 
i-tree, S € S, i,s such that Clnf(S) < i < s < CSup(S), let 
T’ = T[Clinf(S) — i, CSup(S) — s] and Typ = Ry (T’). Then 
1) Tir A-La 2) Tie K T, and 3) if «(S$ > So), then 


(CInf(So), CSup(So)) = (Clnfive(So), CSupye (So)). 
Proof. 


1. We prove this result by induction on the depth of S' in the tree 
(S, ~»). The key step of this proof is to show that the application 
of UpSup and/or UpInf to the father S’ of S do not produce 
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a situation where as holds in the resulting diagram 7” (and 
therefore the rule Error is not applicable in J”). We distinguish 
three different cases : 


¢ When UpSup and UpInf are both applicable to this node 
S” we have Clnf”(S’) < CSup’(S’) because for C € 
Comp($"), Q € Split(S’) such thatQ C C,Q = {S}w 


SiC soe; 
Qo,C ={S}WCoandT’ —s “2 T" we have 
UpSup UpInf 


CInf”"(S") =5(CInf’[Qo]) + Clnf’(S) 


( [Qo]) (by ba) 
= X(CInf’[Qo]) ag CSup’(S) (a < 8) 
< ¥(CSup’[Qo]) + CSup’(S) (by Cs) 
<X(CSup’[Co]) + CSup’(S) (Qo € Co) 
= CSup”’($”) (by bs) 


¢ When only UpSup is applicable to this node S’ we have 
ClInf’(S’) < CSup’(S’) because for C € Comp(9"), 


C={S}wCyandT! 22S 


UpSup 


CSup"(S") = X(CSup’[Co]) + CSup’(S) (by bs) 


T"” we have 


> X(CSup’[Qo]) + Cinf’(S) (4 < s) 
> Clnf’(S") (by Ce) 
= Clnf”(S’) 


¢ When only UpInf is applicable to this node S$’ we have 
ClInf”(S’) < CSup’(S’) because for Q € Split(9’), 


Q = {S}W Qo and T’ <9 T" we have 
UpInf 


Cinf”(S") =¥(CInf’[Qo]) + Clnf’(S) (by ba) 


< X(Cinf’[Qo]) + CSup’(S) (i <s) 
< CSup’(9’) (by C7) 
= CSup”(S") 


2. Follows easily from the hypothesis ClInf(S) <i <s < 
CSup(S’) and the fact that Ry is semantics preserving. 


3. It is enough to notice that only rules UpInf and UpSup are used 
when applying Ry, and these rules are applied in the bottom- 
up direction. m 


Lemma 9 (Parallel Bounds Refinement (1)) Let T be a strongly 
consistent i-tree, and (Qo, ~+) a subtree of T which has the same 
root as T and is such that 


e The nodes of Qo are pairwise independent, that is 
VS1,S2 € Qo.7(disjz ($1, S2)) 
Then the i-tree T’ defined by 


T' “! T [VS € Qo:Clnf(S)—CSup(S) | 


is such that his R” normal form Tyr — 

1. Tie #La 

2. The HE T 

3. VS € Qo. VQ € Splitye(S). QIN Qo =0=> 
(Cinfe[Q]) < Clnfye (5) 


Proof. If we apply [CInf(S’)—CSup(9")] to every node S’ of Qo 
starting from the root to the leaves, we always maintain C,,C2,C3 
because ® and CSup are never modified. We also maintain C4 
because for each S € Qo and each view Q € Split(S) such that 
there exists S’ € QM Qo, by -disj+(S1, S2), we know that S” is 
the only modified node, and 


CInf’(.S) = CSup(S) 
> X(Cinf[Q — {S"}]) + CSup(S’) (by C7) 
= ¥(ClInf’[Q]) 


NF(Z’) satisfies 
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Therefore, J’ is already in normal form, so Tyr is identical to T’ 
and is clearly distinct from 4, proving condition 1. Condition 2 
holds because TJ’ / T because the cardinality bounds in J’ are at 
least as strong as in T. Condition 3 holds because 


&(ClInf’[Q]) = h(Cinf[Q]) (because QM Qo = 9) 
<CSup(S) (by Cs) 
= Cinf’(S) (by definition of T’) 


LEMMA 21 (Parallel Bounds Refinement (2)). Let T be a strongly 
consistent i-tree and S, S2, $1, 55,5 € S and C,Q such that 


C € Comp(S), Qi, Q2 € Split(S) 
Six SLASLEQi1AQ1 CC 
So SSA 85 € Q2AQ2 CC 
Qi # Q2 
Define 
T “! TNS", 5,4 8’ & Sh: Cinf(S”) — CSup(S’)] 
[VS", Sy~> 8’ SS: Clnf(S’) — CSup(S’)] 
Tip = R&e(T" 
T" Rie (Tip [CSup”(S) — Clnfiye(S)]) 
Then T" Ala, T” © T, and CSup"(S) < ©(CSup”[C}). 


Proof. As in the proof of Lemma 9, weak consistency con- 
ditions hold in TJ’ for all nodes in S’ such that S,~>S{ or 
So ~» 95. Therefore, the only rewrite step that mat be applicable 
in T’ is the application of UpInf to S. This application may 
lead to applications of other instances of UpInf, but the proof 
of Lemma 8 shows that this process will result in a weakly con- 
sistent i-tree, so Typ Za. Moreover, the process of comput- 
ing RX-e(7ne[CSup’”(S) <— Clnfyr(.$)}) is identical to applying 
Lemma 8 to 7’ with bounds i = s = Clnfy(S), and therefore 
leads to a weakly consistent i-tree 7”, so 7” Ala. 

The condition T” / T follows because Ry is semantics- 
preserving, and the updates of trees only shrink the bounds on 
nodes, so they convert a diagram into a stronger one. 

To prove CSup’(S) <  X(CSup”[C]), observe first 
that CSup”(S) = Clnfyr(S) by definition of JT’, and 
&(CSup”[C]) = =(CSup[C]) because CSup does not change for 
any ancestors of S. Therefore, it suffices to show 


Clnfyr(S) < &(CSup[C]) 
We prove this condition by distinguishing two cases. 


1. UpInf is not applicable to S. Then Clnfyp (S$) = Clnf(S) and 
the condition follows by Cy. 


2. UpInf is applicable to S. Then for some a, b where {a, b} = 
{1, 2} we have 


Clnfye(S) 


IAIA A II 
MMMM 
ia) 
= 
% 
Q 


7 
A.4.3, Completeness of the algorithm Subsumes 


Theorem 4 Let T be a strongly consistent i-tree and let H4, for 
atomic formula A be as defined in Figure 12. Then HA if and only 
fT EA. 

The (=) direction of Theorem 4 is trivial by the semantics of i- 
diagrams. For (<=) direction we prove the following characteriza- 
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tions: 
S#0q >4M. a(S) 40 
k € [Clnf(S), CSup(S)] > 3M. fa(s)| =k 
a(Included(5o,C,T)) = 4M. @(S0) Z Ua[C] 
S91 £ 0a A 7(S1 ~ $2) > IM. A(S1) Z A(S2) 
S14 So > IM. a(S1) x a(S2) 
Oa € {S1,S2} A adisj5-(S1,52) = 43M. a(S1) Na(S2) Fz ) 
+P ¢@(S) > 4M. a(S) Z S(P) 
—-P¢®(S) > 4M. a(S) Z =(P)° 


where M denotes a model M = (A, a, ©) of T. We next present 
the remaining lemmas that prove these characterizations (see also 
Section 7). 


Lemma 12 [/f an i-tree T is strongly consistent, So,€ S,C € 
P(S), and Included(So, C, T) returns false, then 


aM. (So) Z (Jac 


Proof. Let Qo be the smallest set of nodes such that: 


So S =>S € Qo 
SE Qo A Qi € Comp(S)A 
Si € Qi /\ Incl(S1, C) 51 € Qo 


By definition of Incl we know that Qo NC = 9. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We compute a subtree Qi of Qo, 
by starting from the root and keeping each time at most one son 
for each complete view. We also impose that Q contains S, by 
avoiding to cut the branch which leads to S. 

We then define 


T' = Rye (T[VS € Qi : Clnf(S)—CSup(S)]) 


According to Lemma 9 (Parallel Bounds Refinement) we have 
Deg. 

We then apply Lemma 20 to construct a model for the weakly 
consistent i-tree J’ such that 


VS’ ES. (@ € a(S’) 


S’€Q1) 
for some element x € @(S). Because S € Q1, we have x € a(S). 
Because CM Qi = 0, we have x ¢ ()@[C]. = 
Lemma 15 /f an i-tree T is strongly consistent, then for all S €S 
and P € PN we have 

(+P) €®(S)>4M. MET AGm(S) Z S(P) 

(-—P) € ®(S) > 4M. MET AGm(S) Z S(P)° 
Proof. Let S € S be such that (+P) ¢ (5S). We define 


Q+p “ef 19! S|(+P ®(S’)}. Using Cio we show that 
Included(S,Q+p,7) cannot return true. Then, there exists a 
model such that @(S) Z (UJa[Q+P]). We then change the model 
by redefining ©’ by: 


VP € PN. 2(P) =|f{a(s’ 


IS’ € Q+P} 


to ensure that &’(P) Z @(S). The case of Ae P) ¢ ee 
analogous by taking a model such that a(S) Z (Ua[Q_p]) an 


redefining 3’ by: 
VP €PN. &'(P) =A-| )f{a(s’)|S" € Q_p} 


Lemma 16 /f an i-tree T is strongly consistent, then for all 
S1,S2 € S such that S; # 0, Sz 4 we have 


A(disj (51, S2)) => 3M. a(S1) NU(S2) £0. 
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Proof. Let $1,52 € S \ {Oa} such that a(disj, (Si, S2)). If 
Si; = S2 we can find a model M where @(S1) = a(S2) 4 0 by 
Lemma 10, in this model @(.S1) N @(S2) = a(S2) 4 0. Suppose 
Si # So. Define S$}, 95, So as the unique nodes such that So is the 
least common ancestor of S$; and $2 in T, and $7,55 € Sons(So) 
are the ancestors of 5; and Sz, respectively. We distinguish two 
cases: 


e S{ and $5 do not belong to a same complete view of So. Then 
apply Lemma 9 to the subtree 


Oy 2S S| SESS VAs St 


whose nodes are pairwise independent by the hypothesis 
disj+ ($1, S2). The resulting tree Ty satisfies the hypothesis 
of Lemma 18, so there exists a model M = (A, a, =) for Tyr 
such that 


a(S1) NA(S2) 4 0. 
M is also a model of T because Typ / T. 
Sand S% belong to a same complete view C. Define 


T “PRY. (TWS', Si ~ 8/5 .S% : Cinf(S") — CSup(S’)] 
[VS", So ~> 8’ 85: Clnf(S’)—CSup(S’)}) 

7"! RY. (T’[CSup”(S) — Cinf’(S)] 
By Lemma 21, then T” — T and CSup”(S) < ©(CSup” [C)). 
This last property allows us to apply Lemma 19 and prove 
the existence of a model (A, a,&) for T such that @(S1) 9 


B(S2) #0. a 
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